arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.26288 2026-05-27 stat.ML cs.LG stat.ME

Beyond Differences: Doubly Robust Meta-Learners for Ratio-Based Treatment Effects

超越差异：基于比率的治疗效应的双重稳健元学习器

Michael Fuchs, Dominik Kreiss

AI总结针对比率型条件平均处理效应（CATE）估计，提出Q-Learner将比率分解为两个优势比的乘积，并推导双重稳健增强版本，在低转化率场景和混杂观测数据中表现优异。

Comments 13+5 pages, 5 figures, 6 tables. Code: https://github.com/michaelfuchs90/ratiobasedcate

详情

AI中文摘要

当治疗效应自然表达为比率时——如在医学、定价和营销中——基于比率的CATE $τ(x) = E[Y|W=1,X=x] / E[Y|W=0,X=x]$ 是合适的估计目标。然而，现有估计器要么施加对数线性参数结构，要么应用通用回归而不对该泛函提供稳健性保证。我们引入了Q-Learner，它将$τ(x)$分解为两个优势比的乘积，将二元结果的比率CATE估计简化为两个倾向性分类任务。我们进一步推导了S/T型和Q型比率学习器的双重稳健增强，并刻画了它们不同的稳健性性质。在七个RCT数据集的基准测试中，Q-Learner在低转化率场景下是最持续有竞争力的方法，其仅基于倾向性的构造规避了伤害基于结果估计器的不平衡回归。在四个观测数据集上，其中倾向性必须估计且混杂无法排除，本文引入的DR学习器明确胜出，使其成为实践者在混杂观测数据中的自然默认选择。

英文摘要

When treatment effects are naturally expressed as ratios -- as in medicine, pricing, and marketing -- the ratio-based CATE $τ(x) = E[Y|W=1,X=x] / E[Y|W=0,X=x]$ is the appropriate estimand. Yet existing estimators either impose a log-linear parametric structure or apply generic regression without robustness guarantees for this functional. We introduce the Q-Learner, which decomposes $τ(x)$ into a product of two odds ratios, reducing ratio-CATE estimation for binary outcomes to two propensity classification tasks. We further derive doubly robust augmentations for both S/T- and Q-style ratio learners and characterize their distinct robustness properties. In benchmarks on seven RCT datasets, the Q-Learner is the most consistently competitive method in low-conversion regimes, where its propensity-only construction sidesteps the imbalanced regression that hurts outcome-based estimators. On four observational datasets, where propensity must be estimated and confounding cannot be ruled out, the DR learners introduced here decisively come out on top, making them practitioners' natural default for confounded observational data.

URL PDF HTML ☆

赞 0 踩 0

2605.26286 2026-05-27 cs.MA cs.AI cs.RO

Decoupled Delay Compensation: Enhancing Pre-trained MARL Policies via Learned Dynamics Filtering

解耦延迟补偿：通过学习的动力学过滤增强预训练的多智能体强化学习策略

Maxim Mednikov, Oren Gal

AI总结针对多智能体强化学习在延迟观测和通信延迟下的性能退化问题，提出一种模块化的执行阶段状态估计层，利用学习的门控转移模型和递归卡尔曼滤波从异步测量中估计当前状态，作为预训练策略的即插即用模块，显著提升对通信延迟和丢包的鲁棒性。

Comments 8 pages, 7 figures

详情

AI中文摘要

现实世界中的多智能体强化学习系统通常必须在过时观测、随机通信延迟和间歇性丢包下运行。在理想同步条件下训练的策略在这些场景中常常表现出显著的性能下降，因为它们基于过时的反馈行动。我们提出了一种模块化的执行阶段状态估计层，用当前信念状态估计替换延迟的通信观测。该框架将学习的门控转移模型与递归卡尔曼滤波层相结合，从异步测量中估计瞬时状态。该方法的一个主要优势是其模块性：估计器作为预训练策略的即插即用模块，无需修改原始MARL训练算法、架构或奖励结构。在多种多智能体和连续控制基准上的评估表明，所提出的层持续增强了对通信延迟和消息丢失的鲁棒性。在协调密集和动态不稳定的任务中观察到最显著的性能提升，这些任务中时间一致性对控制至关重要。

英文摘要

Real-world multi-agent reinforcement learning (MARL) systems must often operate under stale observations, stochastic communication delays, and intermittent packet loss. Policies trained under idealized synchronous conditions frequently exhibit significant performance degradation in these regimes because they act on outdated feedback. We propose a modular execution-stage state-estimation layer that replaces delayed communicated observations with current belief-state estimates. The framework integrates a learned Gated transition model with a recursive Kalman filtering layer to estimate instantaneous states from asynchronous measurements. A primary advantage of this approach is its modularity, The estimator serves as a plug-in for pre-trained policies, requiring no modifications to the original MARL training algorithm, architecture, or reward structure. Evaluation across diverse multi-agent and continuous-control benchmarks demonstrates that the proposed layer consistently enhances robustness to communication latency and message loss. The most significant performance gains are observed in coordination-intensive and dynamically unstable tasks where temporal consistency is critical for control.

URL PDF HTML ☆

赞 0 踩 0

2605.26271 2026-05-27 stat.ML cs.LG econ.EM

Learning Nonlinear Factor Models with Unknown Monotone Links from Incomplete and Noisy Data

从不完整和含噪数据中学习具有未知单调链接的非线性因子模型

Yutong Chao, Resat Gökhan, Jalal Etesami, Ali Habibnia

AI总结研究从含噪和不完整数据中联合恢复低秩因子、载荷和未知单调链接函数的问题，提出投影块坐标下降算法并建立收敛保证。

详情

AI中文摘要

我们研究了一个非线性因子模型，其中观测响应通过未知的单调链接函数依赖于低秩潜在因子。由于严重的非凸性和可识别性问题，这一设置具有挑战性且在很大程度上未被充分探索。链接函数假设位于再生核希尔伯特空间（RKHS）中，从而在保持可识别性的同时实现灵活的非参数建模。我们将问题表述为从可能不完整和含噪的观测中联合恢复低秩因子、载荷和非线性链接函数，并提出一种带有显式正则化的投影块坐标下降（BCD）算法以解决尺度和旋转模糊性。在因子的弱不相干性和标准采样条件下，我们建立了无噪声和有噪声情况下的收敛保证，以及链接函数更新的次线性遗憾界。我们的结果将经典线性因子模型推广到广泛的非线性领域，并为学习非线性潜在结构提供了一个原则性框架。我们通过受控的合成实验评估了所提出的方法，显示出有希望的性能。

英文摘要

We study a nonlinear factor model in which observed responses depend on low-rank latent factors through an unknown monotone link function. This setting is challenging and largely underexplored due to severe nonconvexity and identifiability issues. The link function is assumed to lie in a reproducing kernel Hilbert space (RKHS), enabling flexible nonparametric modeling while preserving identifiability. We formulate the problem as the joint recovery of the low-rank factors, loadings, and the nonlinear link function from possibly incomplete and noisy observations and propose a projected block coordinate descent (BCD) algorithm with explicit regularization to address scale and rotational ambiguities. Under mild incoherence of factors and standard sampling conditions, we establish convergence guarantees in both noiseless and noisy regimes, along with sublinear regret bounds for the link-function updates. Our results extend classical linear factor models to a broad nonlinear regime and provide a principled framework for learning nonlinear latent structures. We evaluate the proposed approach using controlled synthetic experiments, indicating promising performance.

URL PDF HTML ☆

赞 0 踩 0

2605.26203 2026-05-27 cs.MA cs.AI cs.CY cs.GT

AgentSociety: Incentivizing Agentic Social Intelligence

AgentSociety: 激励代理社交智能

Aditya Vema Reddy Kesari, Krishna Reddy Kesari

AI总结提出一种基于流动民主和信息扩散的激励机制AgentSociety，使去中心化代理能够自主协作、策略性沟通并最大化效用，同时通过共识路由实现集体成果。

详情

AI中文摘要

部署的代理的成功依赖于它们利用自身能力处理开放式用户请求的能力，不仅在于直接解决问题，还在于随时间有效利用代理间通信渠道和反馈信号。这需要一个多代理环境，其中代理可以自主运行、策略性沟通、协作行为，并受经济激励驱动，类似于社会中的人类。为实现这一愿景，我们提出$\mathtt{AgentSociety}$，一种基于社会选择理论中的流动民主和信息扩散的去中心化代理协作机制。我们证明$\mathtt{AgentSociety}$为代理提供了一个利用局部上下文自主决策以最大化其效用，同时通过激励协作实现集体成果的环境。具体而言，我们证明委托给更有能力的邻居代理是激励相容的，并通过共识自然生成多代理路由路径。此外，我们的机制激励代理在符合自身利益时选择性向邻居代理披露信息，以获取影响力。我们刻画了纳什均衡，表明代理收益反映了其边际贡献。我们比较并基准测试了在$\mathtt{AgentSociety}$中部署的开源和闭源最先进语言模型所采用的策略配置与最佳响应。最后，我们在真实数据集上评估了$\mathtt{AgentSociety}$中自利异质代理基于共识路由的协作性能。

英文摘要

The success of deployed agents relies on their ability to handle open-ended user requests using their inherent capabilities, not only in solving requests directly but also in effectively leveraging inter-agent communication channels and feedback signals over time. This requires a multi-agent environment where agents can operate autonomously, strategically communicate, behave collaboratively and be driven by economic incentives, much like humans in society. Towards this vision, we propose $\mathtt{AgentSociety}$, a mechanism that enables decentralized agentic collaboration grounded in liquid democracy and information diffusion from social choice theory. We show that $\mathtt{AgentSociety}$ provides an environment for agents to make autonomous decisions utilizing their local context to maximize their utility while achieving collective outcomes through incentivized collaboration. Specifically, we prove that delegation to more competent neighbor agents is incentive compatible and naturally generates multi-agent routing path by consensus. Additionally, our mechanism incentivizes agents to selectively disclose information to their neighbor agents when doing so aligns with their self-interest, so as to garner influence. We characterize the Nash equilibrium showing that agent payoffs are reflective of their marginal contributions. We compare and benchmark strategy profiles adopted by open and proprietary state-of-the-art language models deployed in $\mathtt{AgentSociety}$ against best response. Finally, we evaluate collaborative performance from consensus-based routing among self-interested heterogeneous agents in $\mathtt{AgentSociety}$ on real-world datasets.

URL PDF HTML ☆

赞 0 踩 0

2605.26200 2026-05-27 cs.SE cs.AI

Workflow Closure Is Not Scientific Closure in Auto-Research Systems

工作流闭环并非自动研究系统中的科学闭环

Shuai Wang, Xinyuan Tian, Pangpang Liu, Yize Zhao

AI总结本文指出自动研究系统的工作流闭环不等于科学闭环，并提出通过非自主认知控制下的自主执行、避免目标塌陷、验证塌陷和接受塌陷等设计改进方案。

Comments 26 pages, 1 figure, 2 tables

详情

AI中文摘要

本文论证了工作流闭环并非自动研究系统中的科学闭环。当前系统日益能够内部完成类似研究的循环，从想法生成到实验执行、写作和自我评估。这一成就是真实的，但本身并不能使输出结果具有科学地位。我们认为，值得信赖的自动研究不应追求自主自足，而应追求在非自主认知控制下的自主执行。基于对该快速兴起领域100多篇近期论文和代码仓库的调查，以及对21个代表性系统的结构化审计，我们诊断出一个反复出现且结构相连的失败模式：目标塌陷，即单一代理目标取代多目标科学目标；验证塌陷，即内部自我评估取代独立验证；以及接受塌陷，即基准分数或类出版物产物取代领域级批评、重用和整合机制。这些塌陷并非自主性的固有局限，而是可纠正的设计选择。因此，我们概述了在目标信号、验证和输出路径方面的潜在补救措施，以引发社区讨论。

英文摘要

This paper argues that workflow closure is not scientific closure in auto-research systems. Current systems can increasingly complete research-like loops internally, moving from idea generation to experiment execution, writing, and self-evaluation. That achievement is real, but it does not by itself give the resulting outputs scientific standing. We argue that trustworthy auto-research should not aim for autonomous self-sufficiency, but should aim for autonomous execution under non-autonomous epistemic control. Based on a survey of more than 100 recent papers and repositories in this rapidly emerging area, together with a structured audit of 21 representative systems, we diagnose a recurring and structurally connected failure pattern: objective collapse, in which single-proxy targets replace multi-objective scientific aims; validation collapse, in which internal self-evaluation replaces independent validation; and acceptance collapse, in which benchmark scores or publication-shaped artifacts replace mechanisms for domain-level critique, reuse, and integration. These collapses are not inherent limits of autonomy but correctable design choices. Accordingly, we outline potential remedies across objective signal, validation, and output pathway to spark community discussion.

URL PDF HTML ☆

赞 0 踩 0

2605.26178 2026-05-27 cs.MA cs.LG

ATOM: Instantiating Budget-Controllable Multi-Agent Collaboration via Nucleus-Electron Hierarchy

ATOM: 通过核-电子层次结构实例化预算可控的多智能体协作

Xinkui Zhao, Sai Liu, Yifan Zhang, Qingyu Ma, Zewen Lin, Naibo Wang, Guanjie Cheng, Chang Liu, Yueshen Xu

AI总结提出ATOM框架，采用核-电子层次结构和任务驱动强化学习，生成预算可控的协作图，在保持性能的同时将token效率提升高达30%。

详情

AI中文摘要

基于大型语言模型的多智能体系统依赖优化的协作拓扑来平衡性能和通信成本。然而，当前方法难以处理固有的稳定性-可扩展性权衡，并且常常使计算预算与查询难度不匹配。我们提出ATOM，一个自适应框架，通过新颖的任务驱动强化学习范式生成预算可控的协作图。受原子结构启发，ATOM采用核-电子层次结构：它维护一个稳定的、离线学习的协作骨干（核），同时在推理过程中动态激活查询条件智能体（电子）。关键的是，一种复杂度感知的预算策略通过估计查询难度来严格调控电子实例化，从而使资源消耗与任务需求对齐。在六个不同基准上的广泛实验表明，ATOM实现了最先进的性能，同时与强基线相比，token效率提升了高达30%。

英文摘要

Large Language Model (LLM)-based multi-agent systems rely on optimized collaboration topologies to balance performance and communication costs. However, current methods struggle with the inherent stability-extensibility trade-off and often misalign computational budgets with query difficulty. We propose \textsc{ATOM}, an adaptive framework that generates budget-controllable collaboration graphs via a novel task-driven reinforcement learning paradigm. Inspired by atomic structures, \textsc{ATOM} employs a nucleus-electron hierarchy: it maintains a stable, offline-learned collaboration backbone (the nucleus) while dynamically activating query-conditioned agents (electrons) during inference. Crucially, a complexity-aware budgeting strategy aligns resource consumption with task demands by estimating query difficulty to strictly regulate electron instantiation. Extensive experiments across six diverse benchmarks demonstrate that \textsc{ATOM} achieves state-of-the-art performance while improving token efficiency by up to $30\%$ compared to strong baselines.

URL PDF HTML ☆

赞 0 踩 0

2605.26177 2026-05-27 cs.SE cs.AI

RepoMirage: Probing Repository Context Reasoning in Code Agents with Perturbations

RepoMirage: 通过扰动探测代码智能体的仓库上下文推理

Hanyu Li, Yichi Zhang, Speed Zhu, Hang Su, Jun Zhu, Yinpeng Dong

AI总结提出RepoMirage评估套件，通过语义保持的仓库级扰动和扩展任务，揭示代码智能体在仓库上下文推理中的显著缺陷，并基于结构优先的原型工作流RepoAnchor展示改进。

详情

AI中文摘要

代码智能体目前在仓库级软件工程基准上表现出色，但尚不清楚端到端任务（如问题解决）的成功是否真正反映了仓库上下文推理——即跨多个文件识别任务相关信息并推理其间关系的能力。为探究此问题，我们引入RepoMirage，这是一个基于SWE-Bench Verified构建的两阶段评估套件，采用扰动作为诊断工具，通过改变仓库的暴露方式来增加对上下文推理的需求。首先，RepoMirage-Perturb应用三种语义保持的仓库级扰动，揭示当正确求解需要更广泛的上下文访问时，性能明显下降。RepoMirage-Extend进一步将扰动针对的结构瓶颈转化为问题解决之外的显式任务，平均性能从原始设置的66.8%下降到25.3%，表明仓库上下文推理存在显著缺陷。进一步的轨迹分析揭示了探索漂移，即智能体访问更广泛的仓库上下文但未能将其转化为有效的结构信息。受此观察启发，我们提出RepoAnchor，一种结构优先的原型工作流，将仓库探索与下游问题解决分离，并表明显式的结构支撑带来了显著收益。这些结果揭示了代码智能体在仓库上下文推理中一个先前被忽视的差距，并表明更强的结构感知方法有潜力改进它们。

英文摘要

Code agents are currently having skillful performance on repository-level software engineering benchmarks, but it remains unclear whether success on end-to-end tasks such as issue resolution truly reflects repository context reasoning, the ability to identify the task-relevant information across multiple files and reason over the relations among them. To investigate this question, we introduce RepoMirage, a two-stage evaluation suite built on SWE-Bench Verified that adopts perturbation as a diagnostic tool to increase the demand for context reasoning by transforming how the repository is exposed. First, RepoMirage-Perturb applies three types of semantics-preserving repository-level perturbations, revealing a clear performance drop when correct solving requires broader context access. RepoMirage-Extend further turns perturbation-targeted structural bottlenecks into explicit tasks beyond issue resolution, where the average performance declines from 66.8% in the original setting to 25.3%, indicating a significant deficiency in repository context reasoning. Further trajectory analysis reveals an exploration drift, where agents access broader repository context but fail to turn it into effective structure information. Motivated by this observation, we propose RepoAnchor, a structure-first prototype workflow that separates repository exploration from downstream problem solving, and show that explicit structural scaffolding yields notable gains. These results uncover an previously overlooked gap in repository context reasoning for code agents and suggest that stronger structure-aware methods are potential to improve them.

URL PDF HTML ☆

赞 0 踩 0

2605.26174 2026-05-27 cs.SE cs.AI cs.CL cs.MA

A Universal Cliff and a Design Fingerprint: Cross-Section Defect Detection Under LLM Orchestration

一个通用悬崖与一个设计指纹：LLM编排下的跨段缺陷检测

Hiroki Fukui

AI总结本研究揭示在LLM编排下，所有模型检测跨段矛盾缺陷的能力大幅下降（检测率降低三分之二以上），并发现不同对齐范式下的模型行为差异，其中一家开发商的模型随对齐增强呈现报告标准偏移。

Comments 24 pages, 2 figures. Data and code: doi:10.5281/zenodo.20372696

详情

AI中文摘要

生产级语言模型系统通过将请求分解为不可见的编排工作代理（这些代理重新组合成一份综合报告）来回答请求。我们探究这对一类单个代理无法察觉的缺陷——文档中两个远距离段落之间关系的矛盾——有何影响。保持文档、缺陷、机制、评分和种子不变，我们仅改变模型：来自同一开发商的五代的十个系统，以及来自不同对齐范式的五个提供商。两个层次分离。首先，一个通用检测悬崖：每个在单代理下能发现这些跨段缺陷的模型，在编排下失去该能力，检测率在所有测试的范式中下降三分之二或更多。该悬崖源于机制，无法通过规模或扩展推理弥补。其次，模型跌落后的行为方式。信号检测分解显示，在六个高于随机水平的判别模型中，只有一家开发商的模型沿报告标准轴移动：随着对齐增强，模型漏检更少缺陷，但在干净文档上引发更多误报——这是同一标准偏移的两个方面，在该开发商内部随代际缩放（p < 0.001），而在其他地方几乎不存在。在底层，漏检的缺陷往往并非不可见：模型的私有记录准确重构了结构故障，而综合报告却确认其完好，其关注点放在工件和缺失的合作者上。这难以量化——自动评判不稳定（精确率17-50%），关键词也无法将其与普通同意区分——我们将这种抵抗作为一项发现报告。我们发布所有运行、探针、缺陷密钥、评分提示和脚本。综合报告的置信度对跨分区缺陷无信息量，最对齐的系统并非最安全，而悬崖是结构性的。

英文摘要

Production language-model systems answer a request by partitioning it across an invisible orchestration of worker agents that recompose one integrated report. We ask what this does to a class of defect no single worker can see: a contradiction in the relation between two distant sections of a document. Holding the documents, defects, mechanism, scoring, and seed fixed, we vary only the model -- ten systems across five generations from one developer and five providers from distinct alignment paradigms. Two layers separate. First, a universal detection cliff: every model that finds these cross-section defects under a single agent loses that ability under orchestration, detection falling two-thirds or more across every paradigm tested. The cliff is mechanism-derived and not closed by scale or extended reasoning. Second, how models behave once fallen. A signal-detection decomposition shows that, among the six models discriminating above chance, only one developer's generations move along the reporting-criterion axis: as alignment is strengthened, the model misses fewer defects yet raises more false alarms on clean documents -- two faces of one criterion shift, scaling with generation within that developer (p < 0.001) and near-absent elsewhere. At the floor the missed defect is often not out of view: the model's private record reconstructs the structural fault accurately, while the integrated report signs off on its soundness, its concern spent on the artifact and an absent collaborator. This resists quantification -- an automated judge is unstable (precision 17-50%) and keywords cannot separate it from ordinary agreement -- a resistance we report as a finding. We release all runs, probes, defect keys, scorer prompts, and scripts. An integrated report's confidence is uninformative about partition-spanning defects, the most aligned systems are not the safest, and the cliff is structural.

URL PDF HTML ☆

赞 0 踩 0

2605.26168 2026-05-27 cs.OS cs.LG

LearnedCache: An eBPF-Integrated Perceptron-Based Eviction Policy for the Linux Page Cache

LearnedCache: 一种基于eBPF集成的感知器的Linux页面缓存驱逐策略

Zejia Qi

AI总结提出LearnedCache，一种基于eBPF和单层感知器的Linux页面缓存驱逐策略，通过真实内核数据训练模型，在代表性工作负载下实现高达10%的插入率提升。

Comments 11 pages, 12 figures, 4 listings. Policies and harnesses: https://github.com/JayAndJef/cache_ext_lc . Model and visualizations: https://github.com/JayAndJef/learnedcache

详情

AI中文摘要

Linux是数字时代的基础，占据了云和移动操作系统市场的大部分份额。任何运行Linux的设备都使用Linux页面缓存，这是操作系统和应用程序性能的核心支柱，旨在减少不必要的磁盘访问。许多页面缓存驱逐策略已被开发，但仍受限于启发式方法的僵化。近年来，AI驱动工具的兴起，加上Linux设备工作负载的日益多样化，为机器学习驱动的缓存驱逐策略奠定了基础。该领域已有有前景的研究，但仅限于CDN等用户空间应用。我们开发了LearnedCache，一种基于eBPF集成的单层感知器的Linux页面缓存驱逐策略，使用来自多样化工作负载的真实内核数据进行训练。我们展示了多个线性模型在建模页面重用时间上的中位AUC接近80%，然后进一步将这些模型嵌入Linux内核以进行实时性能评估。通过对每个工作负载与FIFO基线进行50次配对试验的统计测试，LearnedCache表明，在代表性经验工作负载下，机器学习驱动的缓存驱逐策略在Linux内核中是可行的，并且能够在特定工作负载下以统计显著的优势超越传统FIFO，插入率（缓存命中率的频率调整派生指标）提升高达10%，同时开销极小。

英文摘要

Linux is the foundation of the digital age, accounting for the majority of the cloud and mobile OS markets. Any device that runs Linux uses the Linux page cache, a central pillar in OS and application performance, serving to reduce extraneous disk access. Many page cache eviction policies have been developed but remain bound by the rigidity of heuristics. The rise of AI-driven tools in recent years, melded with the ever-increasing variety of workloads for Linux devices, sets the stage for machine-learning-driven cache eviction policies. Promising research has been done in this field, but only in the field of user-space applications such as CDNs. We develop LearnedCache, an eBPF-integrated single-layer perceptron-based cache eviction policy for the Linux page cache, trained on real kernel data from diverse workloads. We demonstrate median AUCs of nearly 80% over multiple linear models modeling page reuse time, then take a step further by embedding these models inside the Linux kernel for real-time performance evaluation. Through statistical testing over 50 paired trials against a baseline of FIFO for each workload, LearnedCache reveals that machine-learning-derived cache eviction policies are practical in the Linux kernel under representative empirical workloads and are able to surpass conventional FIFO by statistically significant margins of up to 10% in insertion rate, a frequency-adjusted derivation of cache hit rate, in specific workloads while incurring minimal overhead.

URL PDF HTML ☆

赞 0 踩 0

2605.26166 2026-05-27 cs.CR cs.AI cs.LG

Enhancing Autonomous Online Intrusion Detection for IoT with Balanced Learning, Reliable Pseudo-Labels, and Lightweight Architectures

增强物联网自主在线入侵检测：平衡学习、可靠伪标签与轻量级架构

Hanzala Afzaal, Danish Memon, Chouhdary Bilal Raza, Muhammad Khurram Shahzad

AI总结针对AOC-IDS在类不平衡、伪标签不可靠、泛化性差和计算开销大等四方面缺陷，提出XGBoost-BalSamp、PseudoFilter、MixupAug和LiteAE等改进方法，在UNSW-NB15上准确率提升至95.45%，参数减少55%。

Comments 9 pages, 5 figures; Code available at https://github.com/danishmemon847/AOC-IDS-Pipeline

详情

AI中文摘要

物联网设备的快速普及迫切需求能够处理动态和不断演变的网络威胁的自适应、资源高效的入侵检测系统。本文研究了AOC-IDS，一种发表于IEEE INFOCOM 2024的最先进的自主在线IDS，它采用具有簇排斥对比损失的自动编码器和自主高斯决策模块。我们首先在UNSW-NB15基准上成功复现了AOC-IDS，达到了89.39%的准确率，与发表的89.19%高度一致。然后我们识别了四个关键局限性：类不平衡、不可靠的伪标签生成、有限的泛化能力以及物联网部署的计算开销，并针对每个问题提出了改进方法。我们的XGBoost-BalSamp方法在UNSW-NB15上达到了95.45%的准确率，比基线提高了6.26%。我们的组合深度学习方法（PseudoFilter、MixupAug和LiteAE）实现了最佳运行准确率90.88%（F1：91.45%），超过了原论文，同时将模型参数减少了55%。这些结果表明，对AOC-IDS的针对性改进在提高实际物联网边缘设备可部署性的同时，实现了持续的准确率提升。

英文摘要

The rapid proliferation of Internet of Things (IoT) devices has created an urgent demand for adaptive, resource-efficient Intrusion Detection Systems (IDS) capable of handling dynamic and evolving cyber threats. This paper investigates AOC-IDS, a state-of-the-art autonomous online IDS published at IEEE INFOCOM 2024, which employs an Autoencoder (AE) with Cluster Repelling Contrastive (CRC) loss and an autonomous Gaussian-based decision module. We first successfully replicate AOC-IDS on the UNSW-NB15 benchmark, achieving 89.39% accuracy in close agreement with the published 89.19%. We then identify four key limitations: class imbalance, unreliable pseudo-label generation, limited generalization, and computational overhead for IoT deployment, and propose targeted improvements for each. Our XGBoost-BalSamp method achieves 95.45% accuracy on UNSW-NB15, a gain of 6.26% over the baseline. Our combined deep learning approach (PseudoFilter, MixupAug, and LiteAE) achieves a best-run accuracy of 90.88% (F1: 91.45%), surpassing the base paper while reducing model parameters by 55%.These results demonstrate that targeted improvements to AOC-IDS yield consistent accuracy gains while improving practical deployability on IoT edge devices.

URL PDF HTML ☆

赞 0 踩 0

2605.26165 2026-05-27 cs.SE cs.AI cs.CL

Tool-Schema Compression Enables Agentic RAG Under Constrained Context Budgets

工具模式压缩实现受限上下文预算下的智能体检索增强生成

Furkan Sakizli

AI总结针对智能体RAG系统中工具模式与检索上下文竞争资源的问题，提出工具模式压缩方法，在8K上下文预算下将平均精确匹配率提升20.5个百分点，并验证了压缩模式在超过800个工具时仍可运行。

Comments 12 pages (8 main + 4 appendix), 7 tables, 2 figures. Code and data: https://github.com/SKZL-AI/tscg

详情

DOI: 10.5281/zenodo.20369668

AI中文摘要

配备数十到数百个工具定义的语言模型的智能体RAG系统面临关键资源冲突：工具模式消耗了检索增强生成所需的相同上下文窗口。我们首次系统研究了这种工具-上下文权衡，评估了14个模型（涵盖1.5B-32B本地模型和一个前沿API模型），在三个上下文预算（8K、16K、32K）下使用28个工具定义进行了6,566次受控API调用。应用TSCG保守配置文件压缩（节省44-50%的模式令牌），我们观察到二元启用效应：在8K令牌时，JSON模式工具定义完全溢出上下文窗口，导致接近零的EM（平均2.6%），而压缩模式恢复了RAG功能，所有八个模型平均精确匹配提升20.5个百分点（六个表现出完全启用的模型平均提升24.7个百分点）。在32K（两种格式都适合）时，五个测试模型中的四个显示delta <= 1个百分点，确认该效应纯粹由预算驱动。在HotpotQA（50个多跳问题）上的外部验证显示，在相同溢出场景下EM提升48个百分点。前沿扩展测试表明，JSON模式在大约494个工具时溢出，而压缩模式在超过800个工具时仍可运行。我们的结果确立了工具模式压缩作为受限上下文部署中智能体RAG的必要基础设施层。所有代码、数据和检查点均已公开。

英文摘要

Agentic RAG systems that equip language models with dozens to hundreds of tool definitions face a critical resource conflict: tool schemas consume the same context window needed for retrieval-augmented generation. We present the first systematic study of this tool-context trade-off, evaluating 14 models spanning 1.5B-32B local models plus one frontier API model across 6,566 controlled API calls at three context budgets (8K, 16K, 32K) with 28 tool definitions. Applying TSCG conservative-profile compression (44-50% schema token savings), we observe a binary enablement effect: at 8K tokens, JSON-schema tool definitions overflow the context window entirely, yielding near-zero EM (2.6% average), while compressed schemas restore RAG functionality with +20.5 pp average exact-match lift across all eight models (+24.7 pp among the six exhibiting full enablement). At 32K -- where both formats fit -- four of five tested models show delta <= 1 pp, confirming the effect is purely budget-driven. External validation on HotpotQA (50 multi-hop questions) shows +48 pp EM under the same overflow scenario. Frontier scaling tests demonstrate that JSON schemas overflow at ~494 tools while compressed schemas remain operational beyond 800 tools. Our results establish tool-schema compression as a necessary infrastructure layer for agentic RAG in constrained-context deployments. All code, data, and checkpoints are publicly available.

URL PDF HTML ☆

赞 0 踩 0

2605.26163 2026-05-27 cs.IT cs.LG math.IT math.OC

Adversarial Water-Filling: Theory, Algorithms and Foundation Model

对抗性注水：理论、算法与基础模型

Xindi Tong, Chee Wei Tan, H. Vincent Poor

AI总结针对频率和空间上的竞争资源分配问题，提出对抗性注水（AWF）问题及其理论和算法，并开发无线基础模型学习AWF搜索动力学，实现超过一个数量级的运行时间改进。

Comments Submitted to IEEE Journal of Selected Topics in Signal Processing

详情

AI中文摘要

频率和空间上的竞争资源分配问题可以表述为发射功率与最坏情况干扰之间的极小极大交互。这种公式自然出现在多运营商低地球轨道（LEO）卫星频谱共享中，其中竞争星座的传输实时干扰。在高斯信道下，AWF在非退化活动信道上强凸-凹，而离散星座通常产生非凸的汞/注水公式。在本文中，我们针对这些实际情况提出了对抗性注水（AWF）问题及其相应的理论和算法。此外，我们为AWF开发了一个无线基础模型来学习AWF搜索动力学。该架构包含置换不变的信道表示、具有稀疏消息传递的约束感知图神经网络（GNN），以及捕获AWF最优性隐含的低维水位线的全局潜在变量。通过学习的投影外梯度迭代，该模型近似于汞/注水下约束极小极大问题的平稳解。我们进一步证明，在局部正则性和收缩性条件下，学习的AWF动力学在正则平稳点附近局部线性收敛。实验表明，在未见过的不同问题规模、不同约束和多个离散星座上具有经验泛化能力，同时与迭代基线相比实现了超过一个数量级的运行时间改进。相关代码可在https://github.com/convexsoft/AWF找到。

英文摘要

Competitive resource allocation problems over frequency and space can be formulated as minimax interaction between transmit power and worst-case interference. This formulation naturally arises in multi-operator low Earth orbit (LEO) satellite spectrum sharing, where transmissions from competing constellations interfere in real-time. Under Gaussian channels, AWF is strongly convex--concave on nondegenerate active channels, whereas discrete constellations yield generally nonconvex mercury/water-filling formulations. In this paper we propose the Adversarial Water-Filling (AWF) problem with corresponding theory and algorithms for these real situations. In addition, we develop a wireless foundation model for AWF to learn the AWF search dynamics. The architecture incorporates permutation-invariant channel representations, a constraint-aware graph neural network (GNN) with sparse message passing, and global latent variables capturing the low-dimensional water level implied by the AWF optimality. Through learned projected extragradient iterations, the model approximates stationary solutions of the constrained minimax problem arising under mercury/water-filling. We further show that, under local regularity and contractivity conditions, the learned AWF dynamics converge locally linearly around regular stationary points. Experiments demonstrate empirical generalization across unseen problem sizes, different constraints, and multiple discrete constellations, while achieving more than one-order-of-magnitude runtime improvements over iterative baselines. The related code can be found at https://github.com/convexsoft/AWF.

URL PDF HTML ☆

赞 0 踩 0

2605.26159 2026-05-27 cs.NI cs.CR cs.LG

Device Context Protocol: A Compact, Safety-First Architecture for LLM-Driven Control of Constrained Devices

设备上下文协议：一种紧凑、安全优先的架构，用于LLM驱动的受限设备控制

Dongxu Yang

AI总结针对LLM控制受限设备的安全问题，提出设备上下文协议（DCP），通过极小的帧开销、协议层安全原语和主机端桥接，在保持低资源占用的同时有效防御幻觉和提示注入攻击。

Comments 15 pages, 5 figures. Reference implementation, Python package (pip install pydcp), and reproduction scripts at https://github.com/device-context-protocol/dcp

详情

AI中文摘要

大型语言模型越来越多地通过模型上下文协议（MCP）作为外部工具的编排器，但MCP是为具有兆字节内存的软件服务构建的，并未覆盖主导物理设备长尾的微控制器。近期工作（IoT-MCP）将MCP移植到边缘网关，峰值内存为74 KB；这仍然排除了最小的商用MCU，并且关键的是，没有解决将不可靠调用者（可能产生幻觉或受到提示注入的LLM）直接控制物理硬件的安全问题。我们提出设备上下文协议（DCP）：一个典型帧小于50字节（6字节头部+CBOR负载+可选的16字节HMAC），一个清单模式，其中能力范围、范围和类型检查、试运行评估以及单位即类型是协议层原语，以及一个主机端桥接，在设备收到任何字节之前拒绝格式错误或幻觉调用。参考固件在ESP32上占用27.6 KB闪存/0.6 KB RAM；Python桥接、ESP32固件和语言无关的一致性套件采用MIT许可证并公开。一项实证研究——由来自四个供应商（DeepSeek、阿里巴巴、智谱、MiniMax）的五个LLM针对六类对抗性提示生成的675次工具调用，其中注入类别实例化了AgentDojo的攻击模板——显示DCP拒绝了100%的能力提升尝试和78%的提示注入尝试，而原始MCP和IoT-MCP为0-1%，在固件占用空间小三个数量级的情况下匹配了结构良好的OpenAPI 3模式的表达能力。我们将DCP定位为MCP（正朝着企业SaaS连接发展）与其未覆盖的物理设备之间缺失的一层。

英文摘要

Large language models are increasingly used as orchestrators of external tools via the Model Context Protocol (MCP), but MCP is built for software services with megabytes of memory and does not descend to the microcontrollers that dominate the long tail of physical devices. Recent work (IoT-MCP) ports MCP to edge gateways at 74 KB peak memory; this still excludes the smallest commodity MCUs and, critically, does not address the safety problem of giving an unreliable caller (an LLM that may hallucinate or be prompt-injected) direct control of physical hardware. We present the Device Context Protocol (DCP): a sub-50-byte typical frame (6-byte header + CBOR payload + optional 16-byte HMAC), a manifest schema in which capability scoping, range and type checks, dry-run evaluation, and units-as-types are protocol-layer primitives, and a host-side Bridge that rejects malformed or hallucinated calls before any byte reaches the device. Reference firmware measures 27.6 KB flash / 0.6 KB RAM on ESP32; the Python Bridge, ESP32 firmware, and a language-neutral conformance suite are MIT-licensed and public. An empirical study -- 675 tool calls produced by five LLMs across four vendors (DeepSeek, Alibaba, Zhipu, MiniMax) against six categories of adversarial prompts, with the injection category instantiating AgentDojo's attack templates -- shows DCP rejects 100% of capability-escalation attempts and 78% of prompt-injection attempts, versus 0--1% for Raw MCP and IoT-MCP, matching the expressiveness of a well-formed OpenAPI 3 schema at three orders of magnitude less firmware footprint. We position DCP as the missing layer between MCP (which is moving toward enterprise SaaS connectivity) and the physical devices it does not reach.

URL PDF HTML ☆

赞 0 踩 0

2605.26158 2026-05-27 cs.CR cs.AI cs.LG

Furina: Fragmented Uncertainty-Driven Refusal Instability Attack

Furina: 碎片化不确定性驱动的拒绝不稳定攻击

Tongxi Wu, Jian Zhang, Yang Gao

AI总结通过揭示大语言模型安全行为存在不稳定区域，提出多指标诊断框架并开发Furina攻击方法，利用碎片化场景提示诱导不确定性放大，实现高效越狱。

Comments This work is accepted as a regular paper at ICML 2026

详情

AI中文摘要

大语言模型和多模态大语言模型的安全对齐通常被认为是一种近二值阈值机制。我们通过揭示安全行为受不稳定区域支配来挑战这一假设，在该区域中，小的扰动会引发随机的拒绝决策而非确定性结果。我们开发了一个结合外部和内部信号的多指标诊断框架来表征这种不稳定性。通过系统实验，我们识别出一个特征性的诊断标志：处于不稳定区域的输入表现出更高的输出不确定性，同时内部安全激活降低，这种解耦现象解释了为什么基于检测的防御无法抵御复杂攻击。基于此框架，我们提出了Furina，一种越狱攻击，它通过碎片化、场景锚定的提示故意诱导这种特征，无需针对模型的优化。Furina在HarmBench上优于强单轮和多轮基线，并在MM-SafetyBench上取得了有竞争力的结果，表明不确定性放大为理解安全漏洞提供了一种有原则且可迁移的机制。代码见：https://github.com/0xCavaliers/Furina_Jailbreak。

英文摘要

Safety alignment in large language models (LLMs) and multimodal large language models (MLLMs) is commonly assumed to operate as a near-binary threshold mechanism. We challenge this assumption by revealing that safety behavior is governed by an instability region where small perturbations induce stochastic refusal decisions rather than deterministic outcomes. We develop a multi-metric diagnostic framework combining external and internal signals to characterize this instability. Through systematic experiments, we identify a characteristic diagnostic signature: inputs in unstable regimes exhibit elevated output uncertainty yet decreased internal safety activation, a decoupling phenomenon that explains why detection-based defenses fail against sophisticated attacks. Building on this framework, we introduce Furina, a jailbreak attack that deliberately induces this signature through fragmented, scene-anchored prompts without model-specific optimization. Furina outperforms strong single-turn and multi-turn baselines on HarmBench and achieves competitive results on MM-SafetyBench, demonstrating that uncertainty amplification provides a principled and transferable mechanism for understanding safety vulnerabilities. Code is available at: https://github.com/0xCavaliers/Furina_Jailbreak.

URL PDF HTML ☆

赞 0 踩 0

2605.26154 2026-05-27 cs.CR cs.AI

MemMorph: Tool Hijacking in LLM Agents via Memory Poisoning

MemMorph：通过记忆投毒实现LLM代理中的工具劫持

Xuanye Zhang, Yongsen Zheng, Zhuqin Xu, Kaiyu Zhou, Bowen Shen, Haoran Ou, Tianwei Zhang, Kwok-Yan Lam

AI总结提出MemMorph攻击，通过向长期记忆注入少量伪装记录，诱导LLM代理自主选择攻击者偏好的工具，在多个基准测试中实现高达85.9%的攻击成功率。

Comments Preprint. Under review

详情

AI中文摘要

LLM驱动的代理能够选择外部工具来完成用户任务。然而，攻击者可能破坏这一过程，引导代理使用不当/错误的工具并实施恶意行为。现有攻击主要操纵工具元数据，这容易被审计检测，并且随着现代代理越来越多地采用记忆模块通过积累经验来优化工具选择策略，这些攻击可能失效。本文提出MemMorph，这是首次通过投毒代理的长期记忆来偏置工具选择的攻击。MemMorph不直接指定工具调用决策，而是注入少量伪装成技术事实、事件报告和操作策略的精心构造记录。这些投毒记录重塑了代理的上下文感知和决策过程，使其自主推断并选择攻击者偏好的工具。在3个基准测试、10个代理骨干和3个记忆模块实现上的实验表明，MemMorph仅需注入3条记录即可达到高达85.9%的攻击成功率，在3种代表性防御下仍保持效力，比最强基线高出25%。我们的发现揭示了长期记忆是工具增强代理中一个关键且被忽视的攻击面，敦促开发记忆层面的完整性保障。

英文摘要

LLM-driven agents are capable of selecting external tools to complete users' tasks. However, attackers could compromise such process, steering agents toward inappropriate/wrong tools and enabling malicious actions. Most existing attacks primarily manipulate the tool metadata, which is easily detectable by auditing and may lose effectiveness as modern agents increasingly adopt memory modules to refine tool selection policies through accumulated experience. This paper proposes MemMorph, the first attack that bias tool selection by poisoning the agent's long-term memory. Rather than explicitly dictating the tool invocation decision, MemMorph injects a small number of crafted records that are disguised as technical facts, incident reports, and operational policies. These poisoned records reshape the agent's contextual perception and decision-making process, leading it to autonomously infer and select the tool preferred by the attacker. Experiments across 3 benchmarks, 10 agent backbones, and 3 memory-module implementations show that MemMorph achieves up to 85.9% attack success rate with only three injected records, outperforming the strongest baseline by up to 25% while retaining potency under 3 representative defenses. Our findings expose long-term memory as a critical and under-explored attack surface in tool-augmented agents, urging the development of memory-level integrity safeguards.

URL PDF HTML ☆

赞 0 踩 0

2605.26151 2026-05-27 physics.med-ph cs.RO

Towards Real-World Identification of Fatigued Muscle Groups via Musculoskeletal Simulation

面向真实世界中疲劳肌肉群识别的肌肉骨骼仿真方法

Jenishkumar Chauhan, Samarth Brahmbhatt, Vineet Vashista

AI总结提出一种通过比较真实自由空间运动与仿真肌肉骨骼模型来无接触识别上肢疲劳肌肉群的算法，实验证明能可靠区分多个疲劳肌肉群，并展示了如何配置先进仿真器以缩小仿真到现实的差距。

Comments Video File: https://www.youtube.com/watch?v=scvi3DCD9UY

详情

AI中文摘要

无接触诊断肌肉骨骼疾病有望改善人口健康以及协作环境中的机器人行为。然而，当前的诊断方法需要现场体检，由训练有素的医生通过接触感知各肌肉施加的力。虽然存在仿真工具，但将其用于真实数据诊断的研究尚不充分。本文提出一种识别上肢哪个肌肉群疲劳的算法。该算法将受试者的真实自由空间运动与仿真的肌肉骨骼模型运动进行比较，因此是无接触的：避免了侵入式传感或现场评估的需要。我们的算法使用基于物理的肌肉骨骼模型模拟各种疲劳条件，并从真实和仿真数据中提取诊断运动特征，进行比较以进行诊断。在真实数据上的实验结果表明，所提方法能够可靠地区分多个疲劳肌肉群。此外，通过全面的性能比较，我们展示了如何正确配置最新的先进肌肉骨骼仿真器，以解决疲劳诊断任务中的仿真到现实差距。我们的方法有望推动远程和自动化诊断的进一步研究，显著降低大规模早期检测的门槛。

英文摘要

Contactless diagnosis of musculoskeletal disorders can potentially improve population health as well as robot behaviours in collaborative settings. However, current diagnosis methods require an in-person physical examination in which a trained physician senses, through contact, the force applied by various muscles. Simulation tools exist, but their use for diagnosis with real data is under-explored. In this paper, we propose an algorithm for identifying which upper-limb muscle group is fatigued. Our algorithm compares the realworld free-space motion of the subject with that of a simulated musculoskeletal model, and is therefore contactless: preventing the need for invasive sensing or in-person assessment. Our algorithm simulates various fatigue conditions using a physics-based musculoskeletal model and extracts diagnostic motion features from both real and simulated data, which are compared for diagnosis. Experimental results on real data demonstrate that the proposed method can reliably distinguish between multiple muscle-groups of fatigue. Additionally, through comprehensive performance comparisons, we show how recent advanced musculoskeletal simulators can be properly configured to address the sim-to-real gap in the context of the fatigue diagnosis task. Our approach can potentially spur further research in remote and automated diagnosis, significantly lowering the barrier to large-scale and early detection.

URL PDF HTML ☆

赞 0 踩 0

2605.26149 2026-05-27 cs.GR cs.CV

AnySurf: Any Surface Generation with Directed Edge

AnySurf: 基于有向边的任意表面生成

Wenda Shi, Chenyuan Pan, Dengming Zhang, Yiren Song, Biao Zhang, Xingxing Zou

AI总结提出AnySurf统一框架，通过有向边增强的柔性双网格表示，实现开放、封闭和混合3D表面的高质量生成，并引入ROS-FT后训练和轻量级DE-Adapter以保持生成性能。

详情

AI中文摘要

开放表面组件在真实工业3D内容中普遍存在，支持渲染、物理模拟和几何编辑。服装作为典型的开放表面类型，现有许多生成方法利用缝纫图案生成2D面板并缝合为3D形状。这种特定领域的设计缺乏可扩展性，无法泛化到鞋子和配饰。常见的基于场的3D生成器优先考虑水密网格，并倾向于在开放表面上创建有缺陷的双层结构。尽管Trellis2采用了无场表示，但其开放表面结果仍存在法线和拓扑错误。我们提出AnySurf，一个统一框架，生成具有准确面朝向的开放、封闭和混合3D表面。基于有向边增强的柔性双网格（FDG-D），我们的表示通过定向网格边保留法线方向信息。我们还提出了ROS-FT后训练和仅增加1%额外参数的轻量级DE-Adapter，促进有向边学习同时保持原始生成性能。我们进一步构建了包含工业服装和封闭配件的Outfit3D数据集。我们的工作将服装建模转化为通用的3D生成任务。实验结果表明，在网格质量和下游应用实用性方面具有优越性。

英文摘要

Open surface components prevail in real industrial 3D content and support rendering, physical simulation and geometric editing. Garments serve as a typical open surface type, with numerous existing generation methods leveraging sewing patterns to generate 2D panels and stitch them into 3D shapes. Such domain-specific designs lack scalability and cannot generalize to shoes and accessories. Common field-based 3D generators prioritize watertight meshes and tend to create flawed double-layer structures on open surfaces. Though Trellis2 adopts field-free representation, its open surface results still contain normal and topology errors. We present AnySurf, a unified framework generating open, closed and hybrid 3D surfaces with accurate face orientation. Built on directed-edge enhanced Flexible Dual Grid (FDG-D), our representation retains normal direction information via oriented grid edges. We also propose ROS-FT post-training and a lightweight DE-Adapter with merely 1% extra parameters, facilitating directed edge learning while preserving original generation performance. We further construct Outfit3D dataset containing industrial garments and closed accessories. Our work transforms garment modeling into a universal 3D generation task. Experimental results demonstrate superior mesh quality and better practicality for downstream applications.

URL PDF HTML ☆

赞 0 踩 0

2605.26146 2026-05-27 cs.SE cs.AI cs.HC

Augment Engineering: A Methodology for Multi-Tool AI Orchestration Across Professional Domains

增强工程：跨专业领域的多工具AI编排方法论

Elias Calboreanu

AI总结提出增强工程学科，通过提示工程和上下文工程的可移植技能，跨领域编排多个专用AI工具，并基于单实践者案例研究验证了方法有效性。

Comments 60 pages, 5 figures, 7 tables. Companion to arXiv:2604.04258 (Context Engineering). Formatted for the Journal of Systems and Software (In Practice track)

详情

AI中文摘要

组织越来越多地在专业领域部署独立的专用AI工具，通常为每个工具雇佣领域专家，这重现了AI本应转变的人员配置模式。然而，使这些工具有效的元技能——提示工程（交互级优化）和上下文工程（结构化输入流水线设计）——是可跨领域移植的：掌握这些技能的实践者可以将其应用于任何领域的任何专用AI工具。本文将增强工程定义为跨不同专业领域编排多个专用AI工具的学科，应用提示工程和上下文工程作为可跨工具边界转移的可移植能力。我们提出一个六阶段编排方法论和四个可移植性指标。一个为期5个月的形成性案例研究（2025年11月至2026年3月）记录了一位实践者将这些技能应用于跨越七个专业领域的十个组件编排栈，产出了传统上需要不同领域专家才能完成的工作产品。两个定量观察与框架预测一致：Cochran-Armitage趋势检验（n=200次交互，跨两个聊天LLM，p<0.01）显示首次接受率随提示复杂度水平上升；Wright定律拟合（n=82个工件，p<0.01）显示工件组合的生产加速。由于所有观察来自单一位实践者，推断统计是探索性和假设生成的，而非确认性的；整个组合的可移植性有待多实践者复制。增强工程完成了三个学科的演进：提示工程（一个工具）、上下文工程（可复现流水线）、增强工程（跨领域工具组合）。

英文摘要

Organizations increasingly deploy separate purpose-built AI tools across professional domains, often hiring domain specialists for each, recreating the staffing models AI was expected to transform. Yet the meta-skills that make these tools effective, prompt engineering (interaction-level optimization) and context engineering (structured input pipeline design), are domain-portable: a practitioner who masters them can apply them to any purpose-built AI tool in any domain. This paper defines Augment Engineering as the discipline of orchestrating multiple purpose-built AI tools across distinct professional domains, applying prompt and context engineering as portable competencies that transfer across tool boundaries. We present a six-phase orchestration methodology and four portability metrics. A 5-month formative case study (November 2025 to March 2026) documents a single practitioner applying these skills across a ten-component orchestration stack spanning seven professional domains, producing work products that would traditionally involve separate domain specialists. Two quantitative observations are consistent with the framework's predictions: a Cochran-Armitage trend test (n = 200 interactions across two chat LLMs, p < 0.01) shows first-pass acceptance rising with prompt-sophistication level, and a Wright's Law fit (n = 82 artifacts, p < 0.01) shows production acceleration across the artifact portfolio. Because all observations come from a single practitioner, the inferential statistics are exploratory and hypothesis-generating rather than confirmatory; portability across the full portfolio awaits multi-practitioner replication. Augment Engineering completes a three-discipline progression: Prompt Engineering (one tool), Context Engineering (reproducible pipelines), Augment Engineering (a portfolio of tools across domains).

URL PDF HTML ☆

赞 0 踩 0

2605.26137 2026-05-27 cs.GR cs.AI cs.CV

AssetGen: Deployable 3D Asset Generation at Interactive Speed

AssetGen: 可部署的交互速度3D资产生成

Dilin Wang, Xiaoyu Xiang, Kihyuk Sohn, Tom Monnier, Yu-Ying Yeh, Thu Nguyen-Phuoc, Jiawen Zhang, Yuchen Fan, Antoine Toisoul, Hyunyoung Jung, Prithviraj Dhar, Michael Bunnell, Nikolaos Sarafianos, Chuhang Zou, Roman Shapovalov, Andrea Vedaldi, Rakesh Ranjan

AI总结提出AssetGen系统，通过粗到细的VecSet框架、多视图纹理生成及端到端加速，在30秒内生成带烘焙法线、颜色纹理和可控多边形预算的高质量网格，支持实时渲染和移动端部署。

详情

AI中文摘要

尽管3D生成技术正在快速发展，但近期工作通常侧重于获取高分辨率资产，而将用户体验和可部署性视为事后考虑。我们提出AssetGen，一个专注于这两个方面的3D生成器。给定一张参考图像，它在30秒内生成一个高质量网格，带有烘焙法线、颜色纹理和可控多边形预算，适用于实时渲染，包括移动端用例。AssetGen Flash变体进一步将延迟降低到14秒，适用于交互式和代理式创作循环。我们的模型使用粗到细的VecSet框架生成物体几何，该框架在GPU上实现网格简化、清理和法线烘焙，以及快速并行UV展开。然后以多视图方式生成纹理，随后进行反投影和3D修复。模型蒸馏、内核优化和流水线并行化被协同设计以加速整个系统。我们引入了大量自动化和盲人机评估，并在30秒内展示了与领先商业解决方案相当的视觉质量，在不到15秒内展示了预览质量的结果。最终结果是一个支持AI辅助、可部署的3D内容创建的系统，适用于交互式工作流。

英文摘要

While 3D generation is progressing rapidly, recent work has often focused on obtaining high-resolution assets, leaving user experience and deployability as afterthoughts. We present AssetGen, a 3D generator that focuses instead on these two aspects. Given one reference image, in 30 seconds it produces a high-quality mesh with baked normals, a color texture, and a controlled polygon budget suitable for real-time rendering, including mobile use cases. The AssetGen Flash variant further reduces latency to 14 seconds for interactive and agentic creation loops. Our model generates the object geometry with a coarse-to-refine VecSet framework, which implements mesh simplification, cleaning, and normal baking on the GPU, and a fast parallel UV unwrapping. It then generates textures in a multi-view fashion, followed by backprojection and 3D inpainting. Model distillation, kernel optimization, and pipeline parallelization are co-designed to accelerate the system end-to-end. We introduce numerous automated and blind human evaluations and demonstrate competitive visual quality against leading commercial solutions in 30 seconds and preview-quality results in less than 15 seconds. The final result is a system that supports AI-assisted, deployable 3D content creation in interactive workflows.

URL PDF HTML ☆

赞 0 踩 0

2605.26127 2026-05-27 physics.med-ph cs.LG eess.IV

Rapid online deep artifact suppression for real-time spiral bSSFP CMR with blipped-CAIPI simultaneous multi-slice imaging at 1.5 T

1.5 T 下使用 blipped-CAIPI 同步多层成像的实时螺旋 bSSFP CMR 的快速在线深度伪影抑制

Julius Åkesson, Iulius Dragonu, Einar Heiberg, Tina Yao, Rebecca Baker, Ruta Virsinskaite, Daniel Knight, Vivek Muthurangu, Jennifer Steeden

AI总结针对实时同步多层 bSSFP 心脏磁共振成像中采集和重建时间长的问题，提出基于 3D U-Net 的深度伪影抑制方法，实现快速在线重建，显著缩短采集和重建时间，同时保持诊断图像质量。

详情

AI中文摘要

目的：实时（RT）bSSFP MRI 可实现快速自由呼吸心血管成像，但功能评估需要 10-16 层，导致扫描时间延长。同步多层（SMS）成像可减少采集时间，但与非线性轨迹结合时，依赖迭代重建，阻碍了在线使用。本研究探索深度伪影抑制以促进 RT-SMS 的快速在线重建。方法：在 1.5 T 下实现了一种同时采集两层的螺旋 bSSFP SMS RT 序列。重建使用 k 空间层分离，随后在图像空间中使用 3D U-Net 进行深度伪影抑制。对十名健康志愿者进行成像。比较深度伪影抑制和压缩感知（CS）重建的 RT-SMS 图像质量和重建时间。比较深度伪影抑制的 RT-SMS 与参考标准屏气（BH）成像的左心室（LV）和右心室（RV）舒张末期容积（EDV）和收缩末期容积（ESV）以及 LV 质量（LVM）。结果：RT-SMS 采集比 BH 成像快约 13 倍（15 秒 vs 3 分 15 秒）。使用深度伪影抑制的 RT-SMS 重建比 CS 快约 50 倍（30 秒 vs 24 分 55 秒）。深度伪影抑制在定量和定性图像质量上始终优于 CS（p<0.001）。BH 与深度伪影抑制的 RT-SMS 之间的功能一致性良好（LVEDV：-7.5 +/- 6.8 ml，LVESV：-0.9 +/- 4.2 ml，RVEDV：-6.4 +/- 8.4 ml，RVESV：0.2 +/- 10.7 ml，LVM：-10.3 +/- 11.0 g）。结论：RT-SMS bSSFP CMR 的在线深度伪影抑制重建实现了自由呼吸短轴覆盖，同时大幅减少了采集和重建时间，并保持了诊断图像质量。

英文摘要

Purpose: Real-time (RT) bSSFP MRI enables fast free-breathing cardiovascular imaging but requires 10-16 slices for functional assessment, resulting in prolonged scan times. Simultaneous multi-slice (SMS) imaging can reduce acquisition time but when combined with non-Cartesian trajectories, it relies on iterative reconstructions that preclude online use. This study investigates deep artifact suppression to facilitate rapid, online reconstruction of RT-SMS. Methods: A spiral bSSFP SMS RT sequence with two simultaneously acquired slices was implemented at 1.5 T. Reconstruction used slice separation in k-space, followed by deep artifact suppression in image space using a 3D U-Net. Ten healthy volunteers were imaged. RT-SMS image quality and reconstruction time were compared between deep artifact suppression and compressed sensing (CS) reconstructions. Left (LV) and right (RV) ventricular volumes at end diastole (EDV) and end systole (ESV) and LV mass (LVM) were compared between RT-SMS with deep artifact suppression and reference-standard breath-hold (BH) imaging. Results: The RT-SMS acquisition was ~13x faster than BH imaging (15 s vs 3 min 15 s). RT-SMS reconstruction using deep artifact suppression was ~50x faster than CS (30 s vs 24 min 55 s). Deep artifact suppression consistently outperformed CS in quantitative and qualitative image quality (p<0.001). Functional agreement between BH and RT-SMS with deep artifact suppression was good (LVEDV: -7.5 +/- 6.8 ml, LVESV: -0.9 +/- 4.2 ml, RVEDV: -6.4 +/- 8.4 ml, RVESV: 0.2 +/- 10.7 ml, LVM: -10.3 +/- 11.0 g). Conclusion: Online deep artifact suppression reconstruction for RT-SMS bSSFP CMR enables free-breathing short-axis coverage with a substantial reduction in acquisition and reconstruction time while maintaining diagnostic image quality.

URL PDF HTML ☆

赞 0 踩 0

2605.26119 2026-05-27 cs.DC cs.AI

Edge AI Deployment Beyond Models: A BSP-Aware Systems Framework for Industrial Embedded Platforms

超越模型的边缘AI部署：面向工业嵌入式平台的BSP感知系统框架

Pitchai Muthu M

AI总结提出一个五层BSP感知系统框架，将边缘AI部署视为系统工程问题，解决工业嵌入式平台中模型与硬件、BSP、运行时等环节的集成挑战，提升可复现性、可诊断性、持续吞吐量和现场可靠性。

Comments 17 pages, 5 figures, industrial white paper

详情

AI中文摘要

工业边缘AI项目通常从模型开始，之后才面对平台。这种顺序具有吸引力，因为它允许早期演示，但当部署目标是具有长产品生命周期、供应商特定内核、异构加速器、安全约束和非平凡I/O路径的嵌入式系统时，这种方法就会失效。在这种环境中，模型只是从传感器开始、经过板级支持包（BSP）、最终进入生产服务循环的更大执行链中的一个组成部分。本文认为，稳健的边缘AI部署必须被视为一个系统问题，而不是一个后期应用打包练习。本文提出了一个面向工业嵌入式平台的BSP感知框架，围绕五个层次组织：硬件、BSP/操作系统适配、运行时与加速、应用/推理、以及运维/验证。讨论基于Android、NXP i.MX、NVIDIA Jetson、ONNX Runtime和TensorRT的供应商架构文档，以及关于嵌入式AI基准测试、设备不稳定性和异构边缘机群的系统文献。结果是一个实用框架，将底层平台工作与可衡量的部署成果（如可复现性、可诊断性、持续吞吐量和现场可靠性）联系起来。

英文摘要

Industrial Edge AI programs often begin with the model and only later confront the platform. That sequencing is attractive because it allows early demonstrations, but it breaks down when the deployment target is an embedded system with long product lifecycles, vendor-specific kernels, heterogeneous accelerators, safety constraints, and nontrivial I/O paths. In that environment, a model is only one component of a larger execution chain that begins at the sensor, traverses the board support package (BSP), and ends in a production service loop. This paper argues that robust Edge AI deployment must be treated as a systems problem rather than a late-stage application packaging exercise. The paper presents a BSP-aware framework for industrial embedded platforms organized around five layers: hardware, BSP/operating-system adaptation, runtime and acceleration, application/inference, and operations/validation. The discussion is grounded in vendor architecture documentation for Android, NXP i.MX, NVIDIA Jetson, ONNX Runtime, and TensorRT, and in systems literature on embedded AI benchmarking, device instability, and heterogeneous edge fleets. The result is a practical framework that connects low-level platform work to measurable deployment outcomes such as reproducibility, diagnosability, sustained throughput, and field reliability.

URL PDF HTML ☆

赞 0 踩 0

2605.26118 2026-05-27 cs.DC cs.AI

Xe-Forge: Multi-Stage LLM-Powered Kernel Optimization for Intel GPU

Xe-Forge：面向Intel GPU的多阶段LLM驱动的内核优化

Marcin Spoczynski, Daniel Fleischer, Moshe Berchansky, Gabriela Ben-Melech Stan, Shira Guskin, Weilin Xu, Adam Siemieniuk, Alexander Heinecke

AI总结提出Xe-Forge，一个多阶段LLM流水线，通过Chain-of-Verification-and-Refinement（CoVeR）代理和硬件验证，自动将Triton内核优化为Intel GPU，实现几何平均1.17倍加速，Flash Attention加速2-13.3倍。

详情

AI中文摘要

将深度学习算法移植到新的硬件加速器上，要求开发人员对其代码库中的每个Triton内核重复应用相同的底层优化——量化、内存访问合并、分块大小调整以及特定架构的变通方法。这种手动、重复的工作是一个主要瓶颈：每个内核都需要针对不同设备间变化的硬件约束进行相同的试错分析，而底层的优化模式却基本一致。我们提出了Xe-Forge，一个多阶段LLM驱动的流水线，为Intel GPU自动化这一过程。给定一个功能正确的Triton内核，该系统应用多达九个优化阶段——从算法重构和算子融合，到块指针现代化、GPU特定调优和开放式探索——每个阶段由一个Chain-of-Verification-and-Refinement（CoVeR）代理驱动，该代理生成候选方案，在真实硬件上验证，并对失败进行迭代。一个精心策划的知识库编码了Intel GPU约束（2的幂次线程束计数、GRF模式、SLM大小），这些约束在LLM训练数据中缺失，使模型保持在架构有效范围内。我们在97个Level-2 KernelBench内核和Intel Arc Pro B70上的Flash Attention上评估了Xe-Forge，实现了相对于PyTorch eager的几何平均1.17倍加速，67%的内核得到改进，九个内核超过5倍（最高82倍），并且在所有测试配置下Flash Attention加速2-13.3倍且无回归——这表明结构化领域知识与硬件在环验证可以系统地消除当前阻碍算法在新加速器上部署的重复移植工作。

英文摘要

Porting deep learning algorithms to new hardware accelerators requires developers to repeatedly apply the same low-level optimizations -- quantization, memory access coalescing, tile size tuning, and architecture-specific workarounds -- to every Triton kernel in their code-base. This manual, repetitive effort is a major bottleneck: each kernel demands the same cycle of trial-and-error profiling against hardware constraints that vary across devices, yet the underlying optimization patterns remain largely consistent. We present Xe-Forge, a multi-stage LLM-powered pipeline that automates this process for Intel GPU. Given a functionally correct Triton kernel, the system applies up to nine optimization stages -- from algorithmic restructuring and operator fusion through block pointer modernization, GPU-specific tuning, and open-ended discovery -- each driven by a Chain-of-Verification-and-Refinement (CoVeR) agent that generates candidates, validates them on real hardware, and iterates on failures. A curated knowledge base encodes Intel GPU constraints (power-of-two warp counts, GRF modes, SLM sizing) that are absent from LLM training data, keeping the model within architecturally valid bounds. We evaluate Xe-Forge on 97 Level-2 KernelBench kernels and Flash Attention on the Intel Arc Pro B70, achieving a 1.17x geometric mean speedup over PyTorch eager with 67% of kernels improving, nine kernels exceeding 5x (up to 82x), and 2--13.3x speedups on Flash Attention across all tested configurations without regression -- demonstrating that structured domain knowledge with hardware-in-the-loop verification can systematically eliminate the repetitive porting effort that currently gates algorithm deployment on new accelerators.

URL PDF HTML ☆

赞 0 踩 0

2605.25678 2026-05-27 stat.ML cs.DS cs.LG math.ST stat.TH

PAC Learning with Bandit Feedback: Sharp Sample Complexity in the Realizable Setting

带强盗反馈的PAC学习：可实现设置下的精确样本复杂度

Steve Hanneke, Qinglin Meng, Shay Moran, Amirreza Shaeiri

AI总结本文研究可实现设置下带强盗反馈的多类PAC学习问题，通过定义新的组合维度（强盗DS维度）并基于ListCascade算法，给出了最优样本复杂度的精确刻画（至多对数因子）。

Comments 18 pages

详情

AI中文摘要

我们研究了可实现设置下带强盗反馈的多类PAC学习问题。在该框架中，存在一个实例空间$\mathcal{X}$和标签空间$\mathcal{Y}$上的未知数据分布，与经典多类PAC学习相同，但学习器无法观察到独立同分布训练样本的标签。相反，在每一轮中，它接收一个无标签实例，预测其标签，并接收仅指示预测是否正确的强盗反馈。尽管有此限制，目标仍与经典PAC学习相同。我们对该问题的最优样本复杂度给出了一个一般性刻画，对于每个概念类至多相差对数因子。该刻画基于一个新的组合维度，称为强盗$\mathrm{DS}$维度，通过我们称为伪盒子的广义组合结构定义。这些结构扩展了$\mathrm{DS}$维度所依赖的伪立方体，允许每个坐标有不同数量的邻居。与通过计数伪立方体中坐标数量来刻画完全信息设置的$\mathrm{DS}$维度不同，强盗$\mathrm{DS}$维度聚合了各坐标的邻居数量，从而得到样本复杂度与邻居总数成比例的刻画。我们还提出了一种通用的学习算法，称为ListCascade，实现了上界，该算法将强盗学习与列表学习联系起来，可能具有独立意义。

英文摘要

We study the problem of multiclass PAC learning with bandit feedback in the realizable setting. In this framework, there is an unknown data distribution over an instance space $\mathcal{X}$ and a label space $\mathcal{Y}$, as in classical multiclass PAC learning, but the learner does not observe the labels of the i.i.d. training examples. Instead, in each round, it receives an unlabeled instance, predicts its label, and receives bandit feedback indicating only whether the prediction is correct. Despite this restriction, the goal remains the same as in classical PAC learning. We provide a general characterization of the optimal sample complexity of this problem, sharp for every concept class up to logarithmic factors. Our characterization is based on a new combinatorial dimension, termed the bandit $\mathrm{DS}$ dimension, defined via generalized combinatorial structures we call pseudo-boxes. These extend the pseudo-cubes underlying the $\mathrm{DS}$ dimension by allowing a different number of neighbors in each coordinate. In contrast to the $\mathrm{DS}$ dimension, which governs the full-information setting by counting the number of coordinates in the pseudo-cube, the bandit $\mathrm{DS}$ dimension aggregates the number of neighbors across coordinates, leading to a characterization in which the sample complexity scales with the total number of neighbors. We also propose a general learning algorithm achieving the upper bound, based on an algorithmic principle called ListCascade, which connects bandit learning to list learning and may be of independent interest.

URL PDF HTML ☆

赞 0 踩 0

2605.24297 2026-05-27 cs.IR cs.AI

Benchmarking Patent Embeddings: A Multi-Task Evaluation of 22 Models Across Retrieval, Classification, and Clustering

专利嵌入基准测试：跨检索、分类和聚类任务的22个模型多任务评估

Amirhossein Yousefiramandi, Ciaran Cooney

AI总结通过评估22个预训练模型在三个任务上的表现，发现最优微调策略取决于下游任务，且单一领域微调会损害跨领域检索性能。

Comments 31 pages, 21 figures

详情

AI中文摘要

关于从业者使用专利嵌入的两个问题出现：(i) 一种微调方案是否适用于所有下游应用？(ii) 在一个专利领域上的微调是否足以用于其他领域的下游应用？通过评估22个预训练嵌入模型（参数从22M到12B）在三个任务——信息检索、分类和聚类——上的表现，使用113,148件WIPO辅助技术专利（46,069个引文查询）和外部DAPFAM数据集，我们发现两个结果对普遍认知提出了质疑。(i) 最优微调方案取决于下游任务：跨截面对齐（方案R3）对检索性能提升最大（+7.1% nDCG@10），而组合信号方案（方案R4）更适合分类（+7.1 F1）和聚类（+10.9 V-measure）；匹配数据控制证实训练数据集大小的差异不是影响因素。(ii) 单一领域微调损害了跨领域信息检索：在DAPFAM语料库上，对8个模型-方案组合中的5个，单一领域微调显著降低了跨域检索性能，其中零样本能力较强的模型受损最严重。虽然族内扩展一致（Qwen3 0.6B->4B->8B；Llama-Nemotron 1B->8B），但族间扩展不稳定；12B的KaLM-Gemma3在TAC检索性能上排名第8，经过前缀修改后。标题+摘要+权利要求是普遍最佳文本视图，所有模型在域内和域外性能之间存在55-65%的差距，且无法通过混合BM25-密集融合来弥补。代码和评估框架已公开。

英文摘要

Two questions regarding practitioners' use of patent embeddings arise: (i) Does one fine-tuning recipe suffice for all downstream applications? (ii) Is fine-tuning on one patent landscape sufficient for downstream application on other landscapes? By evaluating 22 pre-trained embedding models (ranging from 22M to 12B parameters) on three tasks -- information retrieval, classification, and clustering -- on 113,148 WIPO patents for assistive technology (46,069 citation queries) and on an external DAPFAM dataset, we find that two results cast doubt on the prevailing wisdom. (i) The optimal fine-tuning recipe depends on the downstream task: cross-sectional alignment (recipe R3) provides the largest improvements to retrieval performance (+7.1% nDCG@10), whereas a combined signal recipe (recipe R4) is better suited to classification (+7.1 F1) and clustering (+10.9 V-measure); a matched data control confirms that differences in training dataset size are not a contributing factor. (ii) Single-landscape fine-tuning hampers cross-landscape information retrieval: fine-tuning on one landscape significantly degrades cross-domain retrieval for 5 of 8 model-recipe combinations on the DAPFAM corpus, with the stronger zero-shot models suffering most. While within-family scaling is consistent (Qwen3 0.6B->4B->8B; Llama-Nemotron 1B->8B), cross-family scaling is erratic; the 12B KaLM-Gemma3 is ranked 8th on TAC retrieval performance, following prefix modification. Title+Abstract+Claims is the ubiquitous best text view, and all models suffer from a 55-65% gap between IN and OUT-of-domain performance which cannot be mitigated by hybrid BM25-dense fusion. Code and evaluation framework are publicly available.

URL PDF HTML ☆

赞 0 踩 0

2605.24038 2026-05-27 physics.space-ph astro-ph.EP astro-ph.IM cs.LG

Aurora Hunter: A Two-Stage Framework for Probabilistic Visibility Forecasting

极光猎人：一种用于概率可见性预测的两阶段框架

Zongyuan Ge, Chenwaner Zhang, Haoyang Li, Hantai Zhang, Wei Zhou, Wenxin Gu, Zhaoming Wang

AI总结提出Aurora Hunter两阶段级联模型，分别预测极光发生概率和观测条件概率，实现高精度极光可见性预测。

详情

AI中文摘要

预测北极光可见性对于空间天气研究和极光旅游具有重要意义。某个地点和夜晚的可见性取决于两个不同因素：（1）极光是否实际发生，由太阳风-磁层耦合驱动；（2）观测条件是否允许肉眼检测，主要是云层覆盖和月照。我们提出了Aurora Hunter，一个两阶段级联模型，将这两个因素解耦。第一阶段使用XGBoost基于51个物理驱动特征预测P(发生)，这些特征在联合的Tromso+Kiruna数据（约16,600小时样本，2015-2023年）上训练，标签来自Tromso AI全天相机图像分类器。第二阶段使用逻辑回归基于21个云层覆盖和月照特征预测P(晴朗观测|发生)，仅在极光发生时段训练。级联模型P(可见)=P(发生)*P(晴朗|发生)在Tromso测试集（2019-2020年）上达到ROC-AUC 0.937，在独立Kiruna数据（2024年）上达到0.905，比单阶段基线提高了0.087。留出的Skibotn数据（2022-2025年）验证了跨站点泛化能力。SHAP识别出Kp×夜侧相互作用、MLT位置和极光椭圆距离为主要预测因子（合计39%）。原型：https://aurora-hunter.onrender.com。

英文摘要

Forecasting aurora borealis visibility matters for space weather research and aurora tourism. Visibility at a site and night depends on two distinct factors: (1) whether aurora is physically occurring, driven by solar wind-magnetosphere coupling, and (2) whether observing conditions allow naked-eye detection, mainly cloud cover and lunar illumination. We present Aurora Hunter, a two-stage cascade that decouples these factors. Stage 1 predicts P(occurring) with XGBoost using 51 physics-driven features trained on joint Tromso+Kiruna data (about 16,600 hourly samples, 2015-2023) with labels from the Tromso AI all-sky image classifier. Stage 2 predicts P(clear observation given occurring) with logistic regression using 21 cloud-cover and lunar-illumination features trained only on aurora-occurring hours. The cascade P(visible)=P(occurring)*P(clear|occurring) reaches ROC-AUC 0.937 (Tromso test, 2019-2020) and 0.905 (independent Kiruna, 2024), improving a single-stage baseline by +0.087. Held-out Skibotn data (2022-2025) confirm cross-site generalization. SHAP identifies the Kp x nightside interaction, MLT position, and auroral oval distance as dominant predictors (39% combined). Prototype: https://aurora-hunter.onrender.com.

URL PDF HTML ☆

赞 0 踩 0

2605.23991 2026-05-27 physics.ao-ph astro-ph.EP cs.LG

Quantification of atmospheric carbon dioxide from the Geostationary Operational Environmental Satellite (GOES East)

从地球静止业务环境卫星（GOES East）量化大气二氧化碳

Aaron Sonabend-W, Sean Campbell, John Platt, Christopher Van Arsdale, Anna M. Michalak

AI总结利用GOES-East卫星的高时空分辨率数据，通过物理引导的神经网络DeepXCO2估算干空气柱CO2摩尔分数，并验证其捕捉真实XCO2变异性的能力。

Comments 28 pages, 9 figures, 1 table

详情

AI中文摘要

随着对温室气体进行本地到全球尺度CO2通量独立验证所需的分辨率、精度和准确度的需求日益迫切，当前一代天基传感器在空间和时间上仅提供稀疏观测。这一挑战激发了人们对利用原本为其他应用开发的现有任务数据来推断全球温室气体变异的兴趣。自2017年运行的地球静止业务环境卫星（GOES-East）上的先进基线成像仪（ABI）从地球静止轨道以10分钟间隔、约2 km²空间分辨率、16个光谱通道提供西半球大部分地区的全覆盖。在此，我们利用这种高空间覆盖和时间重访能力，开发了DeepXCO2——一种单像素、物理引导的神经网络，用于估算干空气柱CO2摩尔分数（XCO2）。DeepXCO2采用GOES-East的16个光谱波段的时间序列、ECMWF ERA5低对流层气象数据、MODIS地表反射率、太阳和卫星观测几何以及年积日。该网络在共置的GOES-East和OCO-2/OCO-3观测数据上训练。与保留的OCO-2和OCO-3观测年份以及TCCON网络观测相比，DeepXCO2能够捕捉真实的XCO2变异性。我们还展示了案例研究，说明利用DeepXCO2观测城市区域上空的XCO2增强和农业区域的XCO2下降。总体而言，虽然GOES-East导出的XCO2精度无法与专用仪器相媲美，但连续地理覆盖、10分钟时间频率和多年记录的 unprecedented 组合提供了观测目前从太空无法看到的大气CO2变异性的潜力。

英文摘要

There is a growing urgency to track greenhouse gasses with the resolution, precision and accuracy needed to support independent verification of $CO_2$ fluxes at local to global scales. The current generation of space-based sensors, however, only provides sparse observations in space and time. This challenge has fueled interest in the potential use of data from existing missions originally developed for other applications to infer global greenhouse gas variability. The Advanced Baseline Imager (ABI) onboard the Geostationary Operational Environmental Satellite (GOES-East), operational since 2017, provides full coverage of much of the western hemisphere at 10-minute intervals from geostationary orbit across 16 spectral channels at an approximately 2 km$^2$ spatial resolution. Here, we leverage this high spatial coverage and temporal revisit to develop Deep$XCO_2$, a single-pixel, physics-guided neural network to estimate dry-air column $CO_2$ mole fraction ($XCO_2$). Deep$XCO_2$ employs a time series of GOES-East's 16 spectral bands, ECMWF ERA5 lower tropospheric meteorology, MODIS surface reflectance, solar and satellite viewing geometry, and day of year. The network was trained on collocated GOES-East and OCO-2/OCO-3 observations. Deep$XCO_2$ is able to capture realistic $XCO_2$ variability when compared against a held-out year of OCO-2 and OCO-3 observations, and against observations from the TCCON network. We also present case studies illustrating the use of Deep$XCO_2$ to observe $XCO_2$ enhancements over urban areas and drawdown over agricultural regions. Overall, while the precision of GOES-East derived $XCO_2$ can never rival that of dedicated instruments, the unprecedented combination of contiguous geographic coverage, 10-minute temporal frequency, and multi-year record offers the potential to observe aspects of atmospheric $CO_2$ variability currently unseen from space.

URL PDF HTML ☆

赞 0 踩 0

2605.22162 2026-05-27 astro-ph.IM astro-ph.SR cs.LG

Spectra as Language: Large Language Models for Scalable Stellar Parameter and Abundance Inference

光谱即语言：用于可扩展恒星参数和丰度推断的大型语言模型

Hai-Ling Lu, Yu-Yang Li, Yin-Bi Li, Cun-Shi Wang, A-Li Luo, Jun-Chao Liang, Shuo Li

AI总结提出两阶段大型语言模型框架，将恒星光谱视为序列信号，实现有效温度、表面重力、金属丰度及约20种化学元素丰度的准确估计，并展示随数据量增加性能系统提升的可扩展性。

详情

AI中文摘要

恒星光谱编码了恒星物理性质和化学成分的关键信息。准确的恒星参数测定对于解决星系和恒星演化等重大问题至关重要。大规模光谱巡天积累了前所未有的光谱数据。传统的特征提取或模型拟合方法难以处理高维、大规模数据集，泛化能力有限且计算效率低。大型语言模型的最新进展在自然语言处理、DNA/RNA序列分析以及蛋白质/化学解析等任务中展示了强大的泛化能力和特征学习能力。恒星光谱是连续的序列信号，使得语言模型可以迁移到恒星光谱学。在此，我们提出一个两阶段大型语言模型框架用于恒星参数推断，实现了有效温度、表面重力、金属丰度以及约20种化学元素丰度的准确估计。缩放律分析显示，随着数据增加，性能系统性地提升，为即将到来的大规模巡天提供了一个可扩展的框架。

英文摘要

Stellar spectra encode key information on the physical properties and chemical compositions of stars. Accurate stellar parameter determination is essential for addressing major questions such as galaxy and stellar evolution. Large-scale spectroscopic surveys have accumulated unprecedented spectral data. Traditional feature extraction or model-fitting approaches struggle with high-dimensional, massive datasets, limited generalization, and computational inefficiency. Recent advances in large language models demonstrate strong generalization and feature-learning in tasks like natural language processing, DNA/RNA sequence analysis, and protein/chemical parsing. Stellar spectra are continuous sequential signals, enabling the transfer of language models to stellar spectroscopy. Here, we propose a two-stage large language model framework for stellar parameter inference, achieving accurate estimation of effective temperature, surface gravity, metallicity, and abundances of ~20 chemical elements. Scaling-law analyses show systematic performance improvements with increasing data, providing a scalable framework for forthcoming large-scale surveys.

URL PDF HTML ☆

赞 0 踩 0

2605.22133 2026-05-27 q-bio.BM cs.AI

Atom-level Protein Representation Learning Improves Protein Structure Prediction

原子级蛋白质表示学习改进蛋白质结构预测

Taewon Kim, Hyosoon Jang, Hyunjin Seo, Seonghwan Seo, Hyeongwoo Kim, Wonho Zhung, Mingyeong Shin, Wooyoun Kim, Sungsoo Ahn

AI总结提出结构感知预训练方法TriProRep，通过VQ-VAE联合建模三种对齐的残基级视图，在结构预测任务中优于仅序列和先前结构感知表示模型。

Comments Project Page: https://holymollyhao.github.io/TriProRep/

详情

AI中文摘要

生成建模的最新进展表明，预训练表示可以作为条件特征或对齐目标来改进生成。受此启发，我们研究用于预测结构（超越常规功能注释）的蛋白质表示。我们提出TriProRep，一种结构感知预训练方法，它联合建模三种对齐的残基级视图：氨基酸身份、主链几何和局部全原子几何，通过VQ-VAE分词器进行离散编码。通过预训练从生成器损坏的视图中恢复原始标记，TriProRep学会区分合理但不正确的跨视图增强与原始蛋白质。我们进一步引入RepSP，一个用于在结构预测设置中评估蛋白质表示的基准。RepSP测试表示的三种用途：从脱辅基链表示进行同源二聚体共折叠、同源二聚体衍生相互作用属性的残基级预测，以及表示对齐的单体结构预测。在这些任务中，TriProRep优于仅序列和先前的结构感知表示模型，同时在常规基准上保持竞争性能。

英文摘要

Recent advances in generative modeling show that pretrained representations can improve generation as conditioning features or alignment targets. Motivated by this, we study protein representations for predicting structures beyond conventional function annotation. We propose TriProRep, a structure-aware pretraining method that jointly models three aligned residue-level views: amino-acid identity, backbone geometry, and local full-atom geometry, discretely encoded via VQ-VAE tokenizers. By pretraining to recover original tokens from generator-corrupted views, TriProRep learns to distinguish plausible but incorrect cross-view augmentations from the original protein. We further introduce RepSP, a benchmark for evaluating protein representations in structure-predictive settings. RepSP tests three uses of representations: homodimer co-folding from apo-chain representations, residue-level prediction of homodimer-derived interaction properties, and representation-aligned monomer structure prediction. Across these tasks, TriProRep improves over sequence-only and prior structure-aware representation models, while maintaining competitive performance on conventional benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2605.19052 2026-05-27 stat.ML cs.LG

Provably Data-driven Lagrangian Relaxation for Mixed Integer Linear Programming

可证明数据驱动的混合整数线性规划拉格朗日松弛

Tung Quoc Le, Anh Tuan Nguyen, Viet Anh Nguyen

AI总结针对混合整数线性规划的拉格朗日松弛，通过数据驱动算法设计框架，理论分析了学习乘子的泛化界和极小化最优速率，并证明随机梯度上升和热启动方法达到最优。

Comments Accepted to ICML 2026

详情

AI中文摘要

拉格朗日松弛（LR）是求解大规模混合整数线性规划（MILP）的强大技术，特别是那些具有可分解结构的问题，如车辆路径或机组组合问题。通过松弛耦合约束，LR能够并行求解子问题，并且通常比标准线性规划松弛产生更紧的对偶界，这对于高效的分支定界剪枝至关重要。虽然最近的实证工作显示出使用机器学习预测这些乘子的有希望的结果，但对此类方法的理论理解仍然是一个开放问题。在这项工作中，我们通过数据驱动算法设计的视角分析学习LR的问题来弥合这一差距，即在一个问题实例分布上的统计学习问题。我们的贡献如下：首先，我们推导出学习乘子的泛化界为$\mathcal{O}(s^{1.5}/\sqrt{N})$，其中$s$是耦合约束的数量，$N$是样本量。其次，我们提供了极小化下界$\Omega(s/\sqrt{N})$，证明线性依赖是不可避免的。第三，我们通过证明带有平均的随机梯度上升（SGA）达到了极小化最优速率$\Theta(s/\sqrt{N})$，建设性地缩小了这一理论差距。最后，我们将框架扩展到学习热启动设置，证明其达到了快速、极小化最优速率$\Theta(s/N)$，并确立了相对于直接乘子预测的理论优势。

英文摘要

Lagrangian Relaxation (LR) is a powerful technique for solving large-scale Mixed Integer Linear Programming (MILP), particularly those with decomposable structures, such as vehicle routing or unit commitment problems. By relaxing the coupling constraints, LR enables parallel subproblem solving and often yields tighter dual bounds than standard linear programming relaxations, which is crucial for efficient branch-and-bound pruning. While recent empirical work has shown promising results using machine learning to predict these multipliers, a theoretical understanding of such methods remains an open question. In this work, we bridge this gap by analyzing the problem of learning LR through the lens of Data-driven Algorithm Design, i.e., a statistical learning problem over a distribution of problem instances. Our contributions are as follows: first, we derive a generalization bound of $\mathcal{O}(s^{1.5}/\sqrt{N})$ for the learned multipliers, where $s$ is the number of coupling constraints and $N$ is the sample size. Second, we provide a minimax lower-bound of $Ω(s/\sqrt{N})$, proving that a linear dependency is unavoidable. Third, we constructively close this theoretical gap by proving that Stochastic Gradient Ascent (SGA) with averaging achieves the minimax optimal rate $Θ(s/\sqrt{N})$. Finally, we extend our framework to the learning-to-warm-start setting, proving that it achieves a fast, minimax-optimal rate of $Θ(s/N)$ and establishing a theoretical advantage over direct multiplier prediction.

URL PDF HTML ☆

赞 0 踩 0

2605.18468 2026-05-27 stat.ML cs.LG

Shallow ReLU$^s$ Networks in $L^p$-Type and Sobolev Spaces: Approximation and Path-Norm Controlled Generalization

浅层ReLU$^s$网络在$L^p$型空间和Sobolev空间中的逼近与路径范数控制的泛化

Weizhao Li, Fanghui Liu, Lei Shi

AI总结本文研究浅层ReLU$^s$网络在$L^p$型空间和Sobolev空间中的逼近能力，并通过$\ell_1$路径范数控制实现非参数回归的极小化最优泛化误差。

Comments 42 pages, 1 figure. Update theorem 2and fix some typos. Authors are listed in alphabetical order and contributed equally

详情

AI中文摘要

本文研究浅层ReLU$^s$网络（$\sigma_s(t)=\max\{0,t\}^s$）的逼近性质及其在$\ell_1$路径范数控制下的泛化行为。对于$L^p$型积分空间$\widetilde{\mathcal{F}}_{p, au_d,s}$（$1\le p\le2$），球谐分析给出了浅层网络的逼近界。特别地，当$ au_d$为均匀测度且$1\le p<2$时，逼近率为：当$1\le p\le p^*$时为$O\!\left(m^{- rac{p(2s+2d+1)-2d}{2dp}} ight)$，当$p^*<p<2$时为$O\!\left(m^{- rac{p(4s+3d-1)-2d+2}{4dp}} ight)$，其中$p^*= rac{2d+2}{d+3}$。通过嵌入到谱Barron空间，得到了Sobolev空间$W^{\alpha,p}$（$1\le p<2$）的逼近界。对于亚高斯噪声下的非参数回归，路径范数正则化的浅层ReLU$^s$网络在$\mathscr{B}_s$上达到极小化最优速率$O\!\left(n^{- rac{d+2s+1}{2d+2s+1}}\log n ight)$，在$W^{\alpha,\infty}$上达到$O\!\left(n^{- rac{2\alpha}{2\alpha+d}}\log n ight)$，且下界匹配至对数因子。

英文摘要

This paper studies approximation by shallow ReLU$^s$ networks, $σ_s(t)=\max\{0,t\}^s$, together with their generalization behavior under $\ell_1$ path-norm control. For the $L^p$-type integral spaces $\widetilde{\mathcal{F}}_{p,τ_d,s}$, $1\le p\le2$, spherical harmonic analysis yields approximation bounds for shallow networks. In particular, when $τ_d$ is the uniform measure and $1\le p<2$, the approximation rate is $O\!\left(m^{-\frac{p(2s+2d+1)-2d}{2dp}}\right)$ for $1\le p\le p^*$ and $O\!\left(m^{-\frac{p(4s+3d-1)-2d+2}{4dp}}\right)$ for $p^*<p<2$, where $p^*=\frac{2d+2}{d+3}$. Approximation bounds for Sobolev spaces $W^{α,p}$, $1\le p<2$, are obtained through embeddings into spectral Barron spaces. For nonparametric regression with sub-Gaussian noise, path-norm-regularized shallow ReLU$^s$ networks achieve minimax-optimal rates $O\!\left(n^{-\frac{d+2s+1}{2d+2s+1}}\log n\right)$ over $\mathscr{B}_s$ and $O\!\left(n^{-\frac{2α}{2α+d}}\log n\right)$ over $W^{α,\infty}$, with matching lower bounds up to logarithmic factors.

URL PDF HTML ☆

赞 0 踩 0