arXivDaily arXiv每日学术速递 周一至周五更新
重置
2605.25725 2026-05-26 cs.CV

TriDP-PTM: a three-stage distortion-perception tradeoff guides the pre-training model for radar cardiac sensing

TriDP-PTM:三阶段失真-感知权衡引导的预训练模型用于雷达心脏感知

Jinye Li, Aidong Men, Yang Liu, Qingchao Chen

AI总结 提出三阶段失真-感知预训练模型(TriDP-PTM),通过雷达-心电图-任务间接路径和复合损失函数,在合作竞争阶段实现最佳下游临床精度。

详情
AI中文摘要

心血管疾病(CVDs)仍然是全球主要的死亡原因,需要连续、准确的非侵入性心脏监测。虽然非接触式雷达方法显示出巨大潜力,但它们通常采用单一的“失真驱动”或“感知驱动”范式,经常面临“低失真但弱语义信息”与“高感知保真度但差可解释性”之间的权衡。为了解决这个问题,我们提出了一种三阶段失真-感知预训练模型(TriDP-PTM),这是一个基于雷达的多尺度融合双路径框架,系统比较了“直接雷达到任务”路径与“间接雷达到心电图到任务”路径。通过将心电图生成器与特征判别器集成以形成复合损失函数,我们的方法有效地将医学先验知识(如心电图形态和节律)纳入下游任务。通过实证分析,我们揭示了这种权衡表现为三个不同阶段(正和、合作竞争和负和),表明最佳的下游临床准确性通常出现在合作竞争阶段。在涉及30名受试者、5种生理状态的数据集上进行的大量实验表明,间接路径在各种任务中始终优于直接路径,在波形分割中实现了0.80的平均IoU,在四个任务中实现了98.3%的平均分类准确率,并且与最强基线相比,血压回归的MAE降低了56%。这些发现验证了我们的框架,并表明在间接雷达到心电图路径中,适当权衡失真和感知损失以在合作竞争机制中运行,对于在非接触式心脏监测中实现临床可解释的心电图形态和强大的下游准确性至关重要。

英文摘要

Cardiovascular diseases (CVDs) remain a leading cause of death globally, necessitating continuous, accurate non-invasive cardiac monitoring. While non-contact radar-based approaches show great promise, they often employ a single "distortion-driven" or "perception-driven" paradigm, frequently facing a trade-off between "low distortion but weak semantic information" and "high perceptual fidelity but poor interpretability." To address this, we propose a Three-stage Distortion-Perception Pre-Training Model (TriDP-PTM), a radar-based multi-scale fusion dual-path framework that systematically compares the "direct radar-to-task" path against an "indirect radar-to-ECG-to-task" path. By integrating an ECG generator with a feature discriminator to form a composite loss function, our approach effectively incorporates medical priors - such as ECG morphology and rhythm - into downstream tasks. Through empirical analysis, we reveal that this trade-off manifests in three distinct phases (Positive-Sum, Coopetitive, and Negative-Sum), showing optimal downstream clinical accuracy typically emerges in the coopetitive stage. Extensive experiments on a dataset involving 30 subjects across 5 physiological states reveal that the indirect path consistently outperforms the direct path in diverse tasks, achieving 0.80 mean IoU in waveform segmentation, 98.3% average classification accuracy across four tasks, and a 56% MAE reduction in blood pressure regression compared to the strongest baselines. These findings validate our framework and indicate that, within the indirect radar-to-ECG pathway, appropriately weighting distortion and perception losses to operate in the coopetitive regime is critical for achieving both clinically interpretable ECG morphology and strong downstream accuracy in non-contact cardiac monitoring.

2605.25720 2026-05-26 cs.AI

Learning to Search and Searching to Learn for Generalization in Planning

学习搜索与搜索学习以实现规划中的泛化

Michael Aichmüller, Yannik Hesse, Hector Geffner

AI总结 提出一种结合关系图神经网络值启发式的自改进WA*学习框架,通过搜索引导和Q学习更新启发式,实现零样本泛化,在多个规划任务中优于深度强化学习。

详情
Comments
Accepted at ICML 2026
AI中文摘要

组合泛化仍然是深度强化学习(DRL)中的一个核心挑战。经典规划通过显式关系描述为研究这一问题提供了一个简单但具有挑战性的环境,无需从感知中学习。在稀疏奖励领域中,通过实时搜索的标准RL探索效率低下,而基于学习的规划方法通常依赖于专家演示、事后重标或从目标状态开始的随机游走。相比之下,规划器依赖于最佳优先搜索方法(如$\mathrm{A}^\star$)从头开始解决问题。我们提出了一种自改进的$\mathrm{WA}^\star$学习框架,结合由关系图神经网络表示的值启发式:启发式引导搜索,产生的搜索数据通过$Q$-学习更新启发式。这个循环产生了可以作为通用策略的启发式,并且即使在没有搜索的情况下也能解决新实例,而DRL在其他情况下会失败,正如我们在Sokoban、PushWorld、The Witness以及2023年国际规划竞赛基准等谜题上所展示的。值得注意的是,我们展示了强大的零样本泛化能力:例如,在少于30个块的Blocksworld实例上训练的启发式,无需搜索即可成功解决包含488个块的实例。

英文摘要

Combinatorial generalization remains a central challenge in Deep Reinforcement Learning (DRL). Classical planning provides a simple yet challenging setting to study this problem through explicit relational descriptions, without requiring learning from perception. In sparse-reward domains, standard RL exploration via real-time search is ineffective, and learning-based planning methods often rely on expert demonstrations, hindsight relabeling, or random walks from the goal state. In contrast, planners rely on best-first search methods such as $\mathrm{A}^\star$ to solve problems from scratch. We propose a self-improving $\mathrm{WA}^\star$ learning framework in combination with a value heuristic represented by a Relational Graph Neural Network: the heuristic guides search, and the resulting search data updates the heuristic via $Q$-learning. This loop yields heuristics that can function as general policies and solve new instances even without search, where DRL otherwise fails, as we show on puzzles such as Sokoban, PushWorld, The Witness, and the 2023 International Planning Competition benchmarks. Notably, we demonstrate strong zero-shot generalization: For example, heuristics trained on Blocksworld instances with fewer than 30 blocks successfully solve instances with 488 blocks without search.

2605.25717 2026-05-26 cs.AI cs.CE cs.LG

FLOATBench: A Dataset and Benchmark for Floating Offshore Wind Turbine Tower Fatigue

FLOATBench:浮式海上风力发电机塔架疲劳数据集与基准

João Alves Ribeiro, Bruno Alves Ribeiro, Francisco Pimenta, Sérgio M. O. Tavares, Faez Ahmed

AI总结 提出FLOATBench,一个包含582,120个疲劳损伤标签的表格基准,基于22 MW浮式风机塔架的高保真仿真,并引入工况感知的评估协议以检测随机划分无法发现的性能排名变化。

详情
AI中文摘要

全球大部分海上风能资源位于水深过大、无法使用固定式基础的海域,因此浮式海上风力发电机(FOWT)对于深水部署至关重要。随着行业向22 MW级设计规模发展,塔架疲劳变得愈发关键,因为更大的结构会放大由持续风浪激励引起的耦合气动-水动-伺服-弹性载荷。准确的疲劳损伤预测对于认证、设计优化和成本降低至关重要。然而,该领域缺乏共享的替代模型基准:不同研究报告了不同的仿真、划分和指标,使得方法难以比较。我们提出FLOATBench,一个公开的表格基准,包含三种22 MW FOWT塔架几何形状的582,120个逐截面疲劳损伤标签,这些标签来自三种塔架的19,404次高保真OpenFAST仿真(每种塔架6,468次:1,078个对齐风浪工况点×六个湍流种子),每种塔架在30个截面上进行标注。FLOATBench包括一个基于工况感知的联合风浪运行包络的alpha-shape划分,将测试点分为训练内、插值和外推区域。它配备了一个可复现的评估框架,涵盖三个协议级别:随机验证(E1)、塔内工况感知评估(E2)和跨塔迁移(E3)。工况感知协议揭示了全局性能与外推性能之间的排名变化,而随机划分排行榜无法检测到这些变化。据作者所知,FLOATBench是首个用于表格替代建模的FOWT疲劳基准,并提供了一个可推广到定义在物理运行包络上的工程替代模型的评估协议。数据集和代码可在以下网址获取:https://github.com/Joao97ribeiro/FLOATBench。

英文摘要

Most of the world's offshore wind resource lies in waters too deep for fixed-bottom foundations, making floating offshore wind turbines (FOWTs) essential for deep-water deployment. As the industry scales toward $22$ MW class designs, tower fatigue becomes increasingly critical because larger structures amplify the coupled aero-hydro-servo-elastic loads induced by continuous wind and wave excitation. Accurate fatigue-damage prediction is therefore central to certification, design optimization, and cost reduction. Yet the field lacks a shared surrogate benchmark: studies report different simulations, splits, and metrics, making methods difficult to compare. We present FLOATBench, a public tabular benchmark with $582{,}120$ per-section fatigue-damage labels across three $22$ MW FOWT tower geometries, derived from $19{,}404$ high-fidelity OpenFAST simulations across the three towers ($6{,}468$ per tower: $1{,}078$ aligned wind/wave operating points $\times$ six turbulence seeds), labeled at $30$ cross-sections per tower. FLOATBench includes a regime-aware alpha-shape partition of the joint wind/wave operating envelope, stratifying test points into in-train, interpolation, and extrapolation regimes. It is paired with a reproducible evaluation harness covering three protocol levels: random validation (E1), within-tower regime-aware evaluation (E2), and cross-tower transfer (E3). The regime-aware protocol reveals rank shifts between global and extrapolation performance that random-split leaderboards cannot detect. To the authors' knowledge, FLOATBench is the first FOWT fatigue benchmark for tabular surrogate modeling, and offers an evaluation protocol that generalizes to engineering surrogates defined over physical operating envelopes. Dataset and code available at: https://github.com/Joao97ribeiro/FLOATBench.

2605.25710 2026-05-26 physics.chem-ph cond-mat.mtrl-sci cs.LG physics.comp-ph

Machine Learning Multiscale Interactions

机器学习多尺度相互作用

Àlex Solé, Sergio Suárez-Dou, Albert Mosella-Montoro, Silvia Gómez-Coca, Eliseo Ruiz, Alexandre Tkatchenko, Javier Ruiz-Hidalgo

AI总结 提出多尺度结构集成(MuSE)层次模型,通过软粗粒化池化构建多尺度表示,与多种机器学习力场耦合,准确捕获跨尺度的量子力学相互作用。

详情
AI中文摘要

现实物理系统的特征在于跨多个长度和时间尺度的涌现相互作用,这对预测性机器学习模型构成了重大挑战。大多数科学机器学习模型关注于狭窄的相互作用范围。虽然机器学习力场提供了接近量子精度的准确性,但普遍的消息传递层缺失了长程多体效应。在此,我们引入多尺度结构集成(MuSE),一种层次模型,它使用软粗粒化池化从原子到粗节点的平滑分数分配构建粗粒表示,使机器学习力场模块能够在多个尺度上运行。MuSE是架构无关的,并与SO3krates、MACE和PaiNN机器学习力场耦合,适用于分子和材料。通过基于Hessian的基准测试、生物分子的折叠轨迹以及分子-石墨烯纳米结构中的能量分布,我们展示了MuSE的强大能力——与近期其他长程机器学习模型不同,MuSE在相关尺度上准确捕获了量子力学相互作用。

英文摘要

Realistic physical systems are characterised by emergent interactions across multiple length and time scales, posing a significant challenge for predictive machine learning (ML) models. Most scientific ML models focus on a narrow range of interactions. While machine learning force fields (MLFFs) offer near-quantum accuracy, the ubiquitous message-passing layers miss long-range many-body effects. Here we introduce the Multiscale Structural Ensemble (MuSE), a hierarchical model that uses Soft Coarse-Graining Pooling to construct coarse representations from smooth fractional assignments of atoms to coarse nodes, enabling MLFF modules to operate across multiple scales. MuSE is architecture-agnostic and coupled with SO3krates, MACE, and PaiNN MLFFs for both molecules and materials. We demonstrate the power of MuSE through Hessian-based benchmarks, folding trajectories for biomolecules, and energy profiles in molecule-graphene nanostructures, where MuSE accurately captures quantum-mechanical interactions at relevant scales -- unlike other recent long-range ML models.

2605.25708 2026-05-26 cs.CV cs.CL cs.ET

CMAP: Cross-Modal Adaptive Prompting for Multi-Domain Task-Incremental Learning

CMAP: 面向多域任务增量学习的跨模态自适应提示

Sriram Mandalika

AI总结 针对多域任务增量学习,提出跨模态自适应提示方法,利用CLIP文本嵌入空间进行任务路由、置信度估计和编码器适应,在MTIL基准上超越现有技术。

详情
AI中文摘要

多域任务增量学习要求模型在视觉多样的域中顺序获取知识,同时不遗忘先前任务,且在推理时无法访问任务身份。基于冻结视觉-语言模型的参数高效方法已取得显著进展,但现有方法完全依赖视觉特征进行任务路由、置信度估计和编码器适应,未利用CLIP的跨模态文本嵌入空间。我们通过三个贡献填补这一空白。文本空间任务路由将视觉高斯匹配替换为与冻结CLIP文本原型的余弦相似度,实现与顺序无关的路由,在零参数成本下对数据稀缺具有鲁棒性。多原型视觉-文本置信度将单高斯类建模替换为K均值视觉原型和任务校准阈值下的跨模态对齐分数。对称跨模态门控将每层Gumbel门扩展到文本编码器,以批量图像特征为条件,在分布外输入上保持跨模态对齐。在涵盖11个数据集和1201个类的MTIL基准上,我们的方法在Order-I下达到74.2%的迁移率、80.5%的平均准确率和88.7%的最终准确率,仅用2.5M可训练参数且无外部数据,分别超越先前最优方法5.0、3.7和3.0个百分点。

英文摘要

Multi-domain task-incremental learning requires a model to sequentially acquire knowledge across visually diverse domains without forgetting prior tasks, and without access to task identity at inference. Parameter-efficient methods built on frozen vision-language models have made strong progress, yet all existing approaches rely exclusively on visual features for task routing, confidence estimation, and encoder adaptation, leaving CLIP's cross-modal text embedding space entirely unexploited. We address this gap through three contributions. Text-space task routing replaces visual Gaussian matching with cosine similarity to frozen CLIP text prototypes, giving order-independent routing robust to data scarcity at zero parameter cost. Multi-prototype visual-textual confidence replaces single-Gaussian class modeling with K-means visual prototypes and cross-modal alignment scores under task-calibrated thresholds. Symmetric cross-modal gating extends per-layer Gumbel gates to the text encoder conditioned on batch image features, preserving cross-modal alignment on out-of-distribution inputs. On the MTIL benchmark spanning 11 datasets and 1201 classes, our method achieves 74.2% Transfer, 80.5% Average, and 88.7% Last under Order-I, surpassing the prior state of the art by 5.0, 3.7, and 3.0 percentage points with only 2.5M trainable parameters and no external data.

2605.25707 2026-05-26 cs.AI

AgentHijack: Benchmarking Computer Use Agent Robustness to Common Environment Corruptions

AgentHijack:基准测试计算机使用智能体对常见环境干扰的鲁棒性

Jingwei Sun, Jianing Zhu, Yuanyi Li, Tongliang Liu, Xia HU, Bo Han

AI总结 提出AgentHijack基准,通过9种可配置的常见环境干扰评估多模态大语言模型驱动的计算机使用智能体的鲁棒性,并设计AgentHijack-Agent框架提升其抗干扰能力。

详情
Comments
accepted by ICML 2026
AI中文摘要

由多模态大语言模型(MLLM)驱动的自主计算机使用智能体正在成为完成复杂数字工作流的得力助手。然而,真实世界的执行环境远非理想:弹出窗口、分辨率变化和竞争性应用频繁干扰智能体的感知和控制。我们引入了AgentHijack,一个旨在评估计算机使用智能体在常见干扰下鲁棒性的基准,其中动态环境中的不确定性在没有直接对抗意图的情况下破坏执行流程。具体来说,AgentHijack引入了9种可配置的常见干扰来复现现实的不完美场景。我们评估了多种利用基于MLLM的智能体的桌面任务,发现即使是微小的干扰实例也会导致显著的性能下降,这强调了智能体的脆弱性以及鲁棒性评估的必要性。随后,我们提出了AgentHijack-Agent,一个将具有增强基础能力的动作生成器与负责行为总结和环境检查的旁观者相结合的框架。大量实验验证了其有效性。我们的代码、环境、基线模型和数据公开于:https://AgentHijack.github.io。

英文摘要

Autonomous computer use agents that powered by multimodal large language models (MLLMs) are emerging as capable assistants for completing complex digital workflows. However, real-world execution environments are far from ideal: pop-ups, resolution changes, and competing applications frequently interfere with agent perception and control. We introduce AgentHijack, a benchmark designed to evaluate the robustness of computer-use agents under common corruptions, where the uncertainties in dynamic environment disrupt the execution flow without direct adversarial intent. Specifically, AgentHijack introduces 9 configurable common corruptions to replicate realistic imperfect scenarios. We evaluate a variety of desktop tasks that utilize MLLM-based agents and discover that even minor instances of corruption can result in substantial performance degradation, which emphasizes the fragility of agents and underscores the necessity of robustness evaluation. Afterward, we propose AgentHijack-Agent, a framework that integrates an action generator with enhanced grounding capabilities and an onlooker responsible for behavior summarization and environment checking. Extensive experiments validate its effectiveness. Our code, environment, baseline models and data are publicly available at: https://AgentHijack.github.io.

2605.25706 2026-05-26 cs.CV

Towards Open-World Referring Expression Comprehension: A Benchmark with Training-free Multi-task Consistency Checker

迈向开放世界的指代表达理解:一种无需训练的多任务一致性检查器基准

Zongjian Wu, Lei Zhang

AI总结 针对现有指代表达理解(REC)基准局限于简单场景和单目标假设的问题,提出OpenRef基准,涵盖多样视觉场景、可变目标数量和丰富词汇类型,并引入无需训练的多任务一致性检查器(MCC)以提升模型在开放世界中的性能。

详情
Comments
17 pages, 7 figures. Project Page: https://zongjianwu.github.io/openref
AI中文摘要

指代表达理解(REC)旨在根据给定表达在图像中定位目标对象。尽管视觉语言模型的最新进展已使REC任务取得显著改进,但当前的REC基准通常局限于简单场景,并假设每个表达映射到唯一对象。这些限制阻碍了REC模型在开放世界环境中的部署。为填补这一空白,我们引入了OpenRef,一个针对复杂视觉和语言场景的新REC基准。OpenRef具有三个关键进展:1)多样化的视觉场景:涵盖多种视觉领域,包括地面视角、无人机视角、黑暗场景和恶劣天气条件;2)可变目标数量:通过多目标和零目标样本打破单目标限制;3)丰富的词汇类型:包含专有名词、多义词和序数词,以适应更广泛的表达需求。此外,由于传统指标不足以应对开放世界设置,我们利用F1衡量定位准确性,并提出N3R(负相对拒绝可靠性)来评估对否定表达的相对拒绝可靠性。最后,我们引入了多任务一致性检查器(MCC),这是一种无需训练但即插即用的策略,通过强制执行一致性自我验证,一键提升模型性能。大量实验表明,本工作显著提升了现有REC模型在复杂场景中的性能,为开放世界REC铺平了道路。项目页面:https://zongjianwu.github.io/openref

英文摘要

Referring expression comprehension (REC) aims to localize a target object within an image based on a given expression. Although recent advances in vision-language models have led to substantial improvements in REC tasks, current REC benchmarks often hold simple scenarios and the assumption that each expression maps to a unique object. These limitations hinder the deployment of REC models in open-world environments. To fill this gap, we introduce OpenRef, a new benchmark for REC in complex visual and linguistic scenarios. OpenRef features three key advancements: 1) Diverse visual scenarios: spanning diverse visual domains, including ground views, drone views, dark scenes and adverse weather conditions; 2) Variable target counts: breaking the single-target limitation with multi-target and none-target samples; 3) Rich vocabulary types: incorporating proper nouns, polysemous words and ordinal terms to fit a wider range of expression needs. Furthermore, as traditional metrics are insufficient for open-world setting, we leverage F1 to measure grounding accuracy and propose N3R (Negative Relative Rejection Reliability) to assess relative rejection reliability against negative expressions. Finally, we introduce Multi-task Consistency Checker (MCC), a training-free but plug-and-play strategy that enhances model performance with one click by enforcing consistency self-verification. Extensive experiments demonstrate that this work significantly advances the performance of existing REC models in complex scenarios, paving the way for open-world REC. Project page: https://zongjianwu.github.io/openref

2605.25704 2026-05-26 cs.CL cs.LG

PowLU: An Activation Function for Stable Pre-Training of LLMs

PowLU: 一种用于LLM稳定预训练的激活函数

Peijie Jiang, Yuqi Feng, Cunyin Peng, Qian Zhao, Jia Liu, KunLong Chen, Zhiqiang Zhang, Jun Zhou

AI总结 提出PowLU激活函数,通过有理幂函数实现自适应非线性,解决SwiGLU在低精度LLM训练中的数值不稳定问题,在大规模训练中取得与SwiGLU和SwiGLU-Clip相当的性能并提升可扩展性。

详情
Comments
17 pages, 7 figures, techreport
AI中文摘要

在当代大型语言模型(LLM)中,swish门控线性单元(SwiGLU)激活函数被广泛采用以调节信息流并引入非线性。对于大的正输入,SwiGLU近似于二次函数$x^2$,提供强非线性和表达能力。然而,这一特性也导致随着输入或模型规模增大时的数值不稳定性,特别是在低精度LLM训练中。主要原因是其近似二次放大,扩大了输出范围并加剧了异常值。为了解决这个问题,我们提出了一种稳定的激活函数——幂线性单元(PowLU),用于大规模LLM预训练。具体来说,PowLU采用有理幂函数实现自适应非线性,从而改善表示能力并在尖峰区域实现稳定训练。此外,我们为PowLU的几个关键性质提供了理论证明。缩放定律实验确认了性能在不同模型规模下的一致性,进一步使用Ling架构(总参数7.9B和124B)的实验结果表明,PowLU在大规模LLM训练中取得了与SwiGLU和SwiGLU-Clip相当的结果。此外,实验结果还表明PowLU有效提升了LLM大规模训练的可扩展性。

英文摘要

In contemporary large language models (LLMs), the swish-gated linear unit (SwiGLU) activation function is widely adopted to regulate the information flow and introduce non-linearity. For large positive inputs, SwiGLU approximates the quadratic function $x^2$, providing strong nonlinearity and expressive capacity. However, this property also causes numerical instability as the input or model scale increases, particularly in low-precision LLM training. The main reason is its approximate quadratic amplification, which enlarges the output range and exacerbates outliers. To address this issue, we propose a stable activation function, Power Linear Unit (PowLU), for large-scale LLM pre-training. Specifically, PowLU employs a rational power function to achieve adaptive nonlinearity, thereby improving representation ability and enabling stable training in spike regions. Moreover, we provide theoretical justification for several key properties of PowLU. Scaling law experiments confirm that the performance is consistent across model sizes, and further experimental results with the Ling architecture (7.9B and 124B total parameters) demonstrate that PowLU achieves competitive results against SwiGLU and SwiGLU-Clip in large-scale training of LLMs. In addition, the experimental results also show that PowLU effectively improves the scalability of the large-scale training of LLMs.

2605.25701 2026-05-26 cs.DC cs.CL cs.IR cs.NI

Neural Router: Semantic Content Matching for Agentic AI

神经路由器:面向智能体AI的语义内容匹配

Lauri Lovén, Abhishek Kumar, Alexander Engelhardt, Alaa Saleh, Roberto Morabito, Xiaoli Liu, Naser Hossein Motlagh, Sasu Tarkoma

AI总结 本文提出将大语言模型作为内容发布/订阅代理的语义匹配引擎,通过分析上下文窗口交叉点和判别能力交叉点,实现成本-准确性权衡,并给出三个可组合算法和自主LLM层级选择框架。

详情
Comments
35 pages, 12 figures. Combined main paper and electronic supplement, folded into one document for arXiv
AI中文摘要

大语言模型(LLM)可以作为边缘-云计算连续体中基于内容的发布/订阅代理的语义匹配引擎,用于智能体AI,弥合关键字和嵌入过滤器无法克服的词汇和模态差距。作为跨社交媒体、法律和智能家居传感器领域三个公共数据集(六个LLM、七个基线)的离线多标签检索,我们的核心贡献是一个双交叉点成本-准确性特征描述:一个分析性上下文窗口交叉点,低于该点时,CoverAndMerge压缩流水线减少LLM调用;以及一个经验性判别能力交叉点,高于该点时,匹配准确性独立于上下文预算而崩溃,取决于参数数量和训练代次的模型相关因素。两个发现具有实际意义:在判别交叉点之上,压缩无法恢复准确性,只有前沿规模的模型才能清除大型订阅集;并且后端选择主导配置选择,因此模型选择(而非流水线调优)是主要操作杠杆。我们为此提供了三个可组合算法和一个用于自主LLM层级选择的每集群体验质量框架。

英文摘要

Large language models (LLMs) can serve as the semantic-matching engine of a content-based publish/subscribe broker for agentic AI across the edge-cloud computing continuum, bridging the vocabulary and modality gaps that defeat keyword and embedding filters. Framed as offline multi-label retrieval over three public datasets spanning social-media, legal, and smart-home sensor domains (six LLMs, seven baselines), our central contribution is a two-crossover cost-accuracy characterisation: an analytical context-window crossover below which a CoverAndMerge compression pipeline reduces LLM invocations, and an empirical discrimination-capacity crossover above which matching accuracy collapses independently of context budget, by a model-dependent factor of parameter count and training generation. Two findings carry practical weight: above the discrimination crossover, compression cannot recover accuracy and only frontier-scale models clear large subscription sets; and there backend choice dominates configuration choice, so model selection, not pipeline tuning, is the primary operator lever. We accompany this with three composable algorithms and a per-cluster Quality-of-Experience framework for autonomic LLM-tier selection.

2605.25698 2026-05-26 cs.LG cs.AI

How Should LLMs Consume High-Quality Data? Optimal Data Scheduling via Quality-Aware Functional Scaling Laws

LLM应如何消费高质量数据?通过质量感知的功能缩放定律实现最优数据调度

Zhitao Zhu, Xili Wang, Shizhe Wu, Jiawei Fu, Xiaoqing Liu

AI总结 本文通过引入数据质量维度扩展功能缩放定律,解析求解了联合数据质量和批次大小调度问题,揭示了高质量数据的双重角色,并提出了Drop-Stable-Rampup调度策略,在15B MoE模型上相比WSD和余弦衰减分别提升平均准确率+1.70和+2.98。

详情
AI中文摘要

高质量数据在大语言模型训练中稀缺,但如何联合训练动态调度其使用缺乏理论指导。我们通过引入数据质量维度扩展功能缩放定律,并以渐近闭式形式求解了联合数据质量和批次大小调度问题。该解揭示了两个阶段和高质量数据的双重角色。在噪声受限阶段,高质量数据应作为信号放大器:降低批次大小将更清洁的数据转换为更多信号而不放大噪声。在信号受限阶段,它应作为噪声抑制器:后期放置可减少终端噪声而不牺牲信号积累。现有的课程式流程主要利用第二个角色,将更清洁的数据放在后期,但忽略了第一个角色,因为传统的衰减调度在高质量数据可用时恰好降低了更新强度。受此启发,我们为LLM中期训练提出了Drop-Stable-Rampup:在质量转换时,降低批次大小,保持稳定以积累信号,然后逐渐增加以抑制终端噪声。在一个在108B tokens上中期训练的15B混合专家模型上,Drop-Stable-Rampup相比Warmup-Stable-Decay (WSD)平均准确率提升+1.70,相比余弦衰减提升+2.98,在数学推理基准如GSM8K (+4.23)和MATH (+2.80)上增益尤其显著。

英文摘要

High-quality data is scarce in large language model (LLM) training, yet how to schedule its use jointly with training dynamics lacks theoretical guidance. We extend functional scaling laws by incorporating a data-quality dimension, and solve the joint data-quality and batch-size scheduling problem in asymptotic closed form. The solution reveals two regimes and a dual role of high-quality data. In the noise-limited regime, high-quality data should be used as a signal amplifier: lowering the batch size converts cleaner data into more signal without amplifying noise. In the signal-limited regime, it should be used as a noise suppressor: late placement reduces terminal noise without sacrificing signal accumulation. Existing curriculum-style pipelines primarily exploit the second role by placing cleaner data late, but miss the first role because conventional decay schedules reduce update intensity exactly when high-quality data becomes available. Guided by this, we propose Drop-Stable-Rampup for LLM midtraining: upon the quality transition, drop the batch size, hold it stable to accumulate signal, then ramp up to suppress terminal noise. On a 15B Mixture-of-Experts model midtrained on 108B tokens, Drop-Stable-Rampup improves average accuracy over Warmup-Stable-Decay (WSD) by +1.70 and over Cosine-decay by +2.98, with particularly large gains on mathematical reasoning benchmarks such as GSM8K (+4.23) and MATH (+2.80).

2605.25696 2026-05-26 cs.LG

Evaluating passing decision-making in professional football: An enhanced MPNN approach to Receiver Selection

评估职业足球中的传球决策:一种增强的MPNN方法用于接球者选择

Gabriel Masella, Giuseppe Alessio D'Inverno, Max Goldsmith, Gianluigi Rozza

AI总结 提出一种图神经网络框架,通过将场上交互建模为动态图来预测最佳传球目标,在接球者选择任务上达到竞争性准确率,并能在数秒内评估超过1000次传球。

详情
AI中文摘要

足球中的决策过程以空间定位、对手压力和球员意图之间的复杂相互作用为特征。本文介绍了一种图神经网络(GNN)框架,旨在通过将场上交互建模为动态图来预测接球者选择,即最佳传球目标。每个球员被表示为一个节点,具有位置和上下文特征,而潜在的传球线形成加权边,由距离、角度和压力指标表征。我们开发并训练了一个消息传递神经网络(MPNN),使用了来自职业比赛的跟踪数据和事件数据的组合,通过基于优化版Needleman-Wunsch算法的稳健流水线进行同步。该模型在识别实际选择的接球者方面达到了竞争性准确率,并在前三建议中达到了最先进的准确率。我们的模型还提供了每个选项的可能性、威胁和创造力的量化,使表现分析师能够在数秒内评估超过1000次传球。

英文摘要

The process of decision-making in football is characterized by a complex interplay between spatial positioning, opponent pressure, and player intent. This work introduces a Graph Neural Network (GNN) framework designed to predict Receiver Selection, the optimal passing target, by modeling on-field interactions as dynamic graphs. Each player is represented as a node with positional and contextual features, while potential passing lines form weighted edges characterized by distance, angle, and pressure metrics. A Message-Passing Neural Network (MPNN) has been developed and trained using a combination of tracking data and event data from professional matches, synchronized through a robust pipeline based on an optimized version of the Needleman-Wunsch Algorithm. The model achieves competitive accuracy in identifying the actual chosen receiver and state-of-the-art accuracy within its top three suggestions. Our model further offers quantification of each option's likelihood, threat, and creativity, enabling performance analysts to evaluate over 1,000 passes in seconds.

2605.25693 2026-05-26 cs.CL cs.DB cs.MA

From Facts to Insights: A Persona-Driven Dual Memory Framework and Dataset for Role-Playing Agents

从事实到洞察:面向角色扮演智能体的角色驱动双记忆框架与数据集

Rongsheng Zhang, Ruofan Hu, Weijie Chen, Jiji Tang, Junnan Ren, Wanying Wu, Xunuoyan Chen, Tangjie Lv, Tao Jin, Zhou Zhao

AI总结 针对长期对话中角色扮演智能体因上下文窗口限制而丧失角色一致性的问题,提出角色记忆数据集RoleMemo和双记忆框架DualMem,通过将记忆解耦为事实认知和角色条件洞察,结合监督微调与强化学习,在4B参数模型上超越基于DeepSeek-V3.2的零样本角色无关框架。

详情
Comments
Preprint
AI中文摘要

尽管角色扮演智能体在短期交互中表现出色,但长期对话会压垮上下文窗口,从而促使外部记忆框架的发展。当前系统通常依赖角色无关的摘要,记录事实而不进行角色特定的解释,导致生成通用回复,损害角色保真度。为弥补这一差距,我们引入了RoleMemo数据集,其中包含四个推理任务,这些任务要求通过角色解释事实片段以得出正确答案。在RoleMemo上的评估揭示了角色无关框架的关键局限性。因此,我们提出了DualMem,它将记忆解耦为两个流:事实认知和角色条件洞察。通过监督微调(SFT)和强化学习(RL)训练,我们的框架使用4B参数模型在持续角色保真度上优于由DeepSeek-V3.2驱动的零样本角色无关框架。我们的资源可在https://github.com/role2026/rolememo获取。

英文摘要

While role-playing agents excel in short-term interactions, long-term conversations overwhelm context windows, motivating external memory frameworks. Current systems typically rely on persona-agnostic summarization, which records facts without persona-specific interpretation, yielding generic responses that compromise persona fidelity. To bridge this gap, we introduce RoleMemo, a dataset featuring four reasoning tasks where the factual fragments must be interpreted through the persona to reach the correct answer. Evaluation on RoleMemo exposes critical limitations of persona-agnostic frameworks. We thus propose DualMem, which decouples memory into two streams: factual cognition and persona-conditioned insight. Trained through Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL), our framework with a 4B-parameter model outperforms zero-shot persona-agnostic frameworks powered by DeepSeek-V3.2 for sustained persona fidelity. Our resources are available at https://github.com/role2026/rolememo.

2605.25686 2026-05-26 cs.CL

Testing the Deliteralization Hypothesis in Human and Machine Translation

测试人类与机器翻译中的去字面化假设

Malik Marmonier, Rachel Bawden, Benoît Sagot

AI总结 通过比较人类翻译、NMT系统与LLM在54个语言对上的字面化程度,验证去字面化假设是否适用于LLM生成与修订过程。

详情
AI中文摘要

从专用NMT系统向通用LLM的近期转变重塑了机器翻译,据报道LLM比其前身产生更流畅、更少字面化的输出。我们测试这种转变是否延伸到去字面化假设,即翻译研究中长期存在的说法:翻译在起草和修订过程中逐渐变得不那么字面化。使用WMT24++数据集,我们比较了人类翻译和后编辑与两个NMT系统和六个LLM在54个语言对和三个任务上的字面化程度:直接翻译、迭代自我修订和人类草稿的后编辑。字面化程度通过基于六个启发式方法构建的经过验证的合成字面化指数来衡量。我们发现:(i) 人类翻译仍然明显比所有测试的MT系统更少字面化,尽管最近的LLM缩小了差距;(ii) 当提示迭代修订自己的输出时,LLM单调地去字面化,首次提供了该假设原生适用于LLM生成的证据;(iii) 作为后编辑者,LLM反转了人类后编辑者的修订触发因素,容忍字面化草稿并针对惯用的人类表述进行修订。

英文摘要

The recent shift from dedicated NMT systems to general-purpose LLMs has reshaped machine translation, with LLMs reported to produce more fluent, less literal output than their predecessors. We test whether this shift extends to the deliteralization hypothesis, the long-standing claim from translation studies that translations become progressively less literal as they are drafted and revised. Using the WMT24++ dataset, we compare the literality of human translations and post-editions to that of two NMT systems and six LLMs across 54 language pairs and three tasks: direct translation, iterative self-revision, and post-editing of human drafts. Literality is measured via a validated Synthetic Literality Index built from six heuristics. We find that (i) human translations remain significantly less literal than those of all tested MT systems, though recent LLMs narrow the gap; (ii) when prompted to iteratively revise their own output, LLMs deliteralize monotonically, providing the first evidence that the hypothesis applies natively to LLM generation; and (iii) as post-editors, LLMs invert the revision triggers of human post-editors, tolerating literal drafts and targeting idiomatic human formulations for revision.

2605.25685 2026-05-26 cs.RO

HumanFlow -- Diffusion-Driven MAV Navigation Among Humans via Tightly-Coupled Motion Tracking, Forecasting, and Control

HumanFlow -- 通过紧耦合运动跟踪、预测和控制的扩散驱动MAV在人群中导航

Simon Schaefer, Joshua Näf, Stefan Leutenegger

AI总结 提出HumanFlow,一种潜在扩散模型,统一了人体运动跟踪与预测,并利用3D场景上下文,在严重遮挡下实现高精度、高效率的运动估计,并通过紧耦合控制实现MAV在人群中的无碰撞导航。

详情
Comments
Accepted to Robotics Science and Systems (RSS), 2026
AI中文摘要

在3D场景上下文中对人类的鲁棒和准确感知对于将机器人集成到日常环境中至关重要。然而,现有方法通常无法预测与周围场景一致的合理且准确的人体运动估计,尤其是在存在严重遮挡或部分可见性的情况下。这可能会限制机器人操作的安全性和效率。我们引入了HumanFlow,一种潜在扩散模型,它统一了人体运动跟踪和预测,并以3D场景上下文为条件。我们展示了我们的人体运动模型在具有挑战性的条件下(包括严重遮挡)能够产生平滑且准确的预测,并且在跟踪精度上优于最先进的方法,同时效率显著更高。此外,我们展示了如何通过将这些表示作为基于流匹配的近似MPC策略的条件,将HumanFlow的潜在空间与控制紧密耦合。我们在模拟中使用真实人类轨迹验证了我们的策略用于MAV社交导航,展示了优越的导航性能,并且在人类部分可观察的情况下仍能保持无碰撞。

英文摘要

Robust and accurate perception of humans in their 3D scene context is essential for integrating robots into everyday environments. Existing approaches, however, often fail to predict plausible and accurate human motion estimates that are consistent with the surrounding scene, especially in the presence of heavy occlusions or partial visibility. This can limit both safety and efficiency for robotic operations. We introduce HumanFlow, a latent diffusion model that unifies human motion tracking and forecasting, conditioned on the 3D scene context. We show that our human motion model produces smooth and accurate predictions under challenging conditions, including heavy occlusions, and outperforms state-of-the-art methods in tracking accuracy while being significantly more efficient. Furthermore, we show how HumanFlow's latent space can be tightly coupled with control by conditioning a flow-matching-based, approximate MPC policy on these representations. We validate our policy in simulation with real human trajectories for MAV social navigation, demonstrating superior navigation performance and remaining collision-free, even under partial observability of the human.

2605.25682 2026-05-26 cs.DC cs.AI

Profiling-Driven Adaptive Distributed Transformer Inference on Embedded Edge Deployment

面向嵌入式边缘部署的剖析驱动自适应分布式Transformer推理

Muhammad Azlan Qazi, Alexandros Iosifidis, Qi Zhang

AI总结 通过结合分段均值压缩和轻量级离线剖析,自适应地在运行时选择本地或分布式执行,解决了嵌入式设备上分布式Transformer推理中CPU-GPU通信瓶颈问题,相比全张量交换降低了65%-77%延迟和34%-52%能耗。

详情
AI中文摘要

将Transformer推理分布在嵌入式边缘设备上可以缓解单个内存和计算约束,但在实际硬件上的实际益处仍不明确:先前的工作主要依赖于忽略硬件特定通信开销的模拟。我们在通过WiFi连接的NVIDIA Jetson Orin Nano设备上进行了硬件原型研究。我们的关键发现是,主要瓶颈不仅是网络带宽,还有通信期间的CPU-GPU暂存。由于Jetson的集成GPU架构缺乏NCCL所需的PCIe/NVLink路径,所有设备间数据通信应通过GLOO路由并在CPU内存中暂存;这种开销随通信数据量扩展,使得对于中等规模模型(如ViT),全张量交换比单设备推理更慢。因此,我们通过结合分段均值压缩与轻量级离线剖析来评估Prism,以在运行时自适应地选择本地或分布式执行。实验表明,相对于静态分布式执行设置中的全张量交换,该策略将延迟降低了65%-77%,能耗降低了34%-52%,证明了剖析驱动自适应对于嵌入式硬件上的实际分布式Transformer推理至关重要。

英文摘要

Distributing Transformer inference across embedded edge devices can alleviate individual memory and compute constraints, yet practical benefits on real hardware remain unclear: prior work relies largely on simulations that overlook hardware-specific communication overheads. We present a hardware prototype study on NVIDIA Jetson Orin Nano devices connected over WiFi. Our key finding is that the dominant bottleneck is not just network bandwidth but also the CPU-GPU staging during communication. Because Jetson's integrated GPU architecture lacks the PCIe/NVLink pathway that NCCL requires, all inter-device data communication should be routed through GLOO and staged in CPU memory; an overhead that scales with communication data volume and makes full-tensor exchange slower than single-device inference across the batch sizes for medium sized models such as ViT. We therefore evaluate Prism by combining Segment Means compression with lightweight offline profiling to adaptively select between local and distributed execution at runtime. Experiments show that this strategy reduces latency by 65%-77% and energy consumption by 34%-52% relative to full-tensor exchange in static distributed execution setup, demonstrating that profiling-driven adaptation is essential for practical distributed Transformer inference on embedded hardware.

2605.25681 2026-05-26 cs.LG cs.AI

Don't Retrain, Just Reuse: Recovering Dual-Target Molecules from Single-Target Diffusion Models

不要重新训练,只需重用:从单目标扩散模型中恢复双目标分子

Qingyuan Zeng, Pengxiang Cai, Zixin Guan, Ziyang Chen, Anglin Liu, Lang Qin, Xinyao Lai, Jintai Chen

AI总结 提出REUSE框架,通过层次化进化输入空间搜索,从冻结的单目标扩散模型中恢复双目标分子,无需重新训练或修改扩散过程,在双目标亲和力上提升20.9个百分点。

详情
AI中文摘要

设计一个能调节两个靶点的单一分子是多药理学中一种有前景的策略,但它比标准的单目标生成要困难得多,因为一个候选分子必须满足两个结合要求,同时保持药物相似性和可合成性。现有的双目标生成方法通常通过在采样期间重新训练生成器或干预扩散过程来引入双目标能力。前者在双目标监督稀疏时可能成本高昂且难以稳定,而后者可能对去噪时的目标平衡和竞争性更新方向敏感。这些局限性促使我们寻找一种保持生成器不变的替代方案:能否在不修改参数或去噪动态的情况下,从冻结的单目标扩散模型的输入空间中恢复双目标候选分子?我们将此任务表述为一个受约束的多目标优化问题,并提出REUSE,一种层次化进化输入空间搜索框架,结合配对条件探索和结构化多阶段选择,以强制执行双目标亲和力、化学质量和多样性。实验表明,与修改扩散过程的方法相比,REUSE持续改善了双目标亲和力和平衡性,在双高亲和力指标上比最强基线提高了20.9个百分点,同时保持了竞争性的分子质量。

英文摘要

Designing a single molecule that modulates two targets is a promising strategy for polypharmacology, but it remains substantially harder than standard single-target generation because one candidate must satisfy two binding requirements while preserving drug-likeness and synthesizability. Existing dual-target generative methods typically introduce dual-target capability by either retraining the generator or intervening in the diffusion process during sampling. The former can be costly and difficult to stabilize when dual-target supervision is sparse, while the latter may be sensitive to denoising-time target balancing and competing update directions. These limitations motivate a generator-preserving alternative that keeps the pretrained prior intact: can dual-target candidates instead be recovered from the input space of a frozen single-target diffusion model, without modifying its parameters or denoising dynamics? We formulate this task as a constrained multi-objective optimization problem and propose REUSE, a hierarchical evolutionary input-space search framework that combines pair-conditioned exploration with structured multi-stage selection to enforce dual-target affinity, chemical quality, and diversity. Experiments show that, compared with methods that modify the diffusion process, REUSE consistently improves dual-target affinity and balance, achieving a 20.9-percentage-point gain in Dual High Affinity over the strongest prior baseline while maintaining competitive molecular quality.

2605.25680 2026-05-26 cs.CL cs.AI

Simulating Human Memory with Language Models

用语言模型模拟人类记忆

Qihan Wang, Nicholas Tomlin, Michael Hu, Brian Dillon, Tal Linzen

AI总结 本研究通过心理学经典记忆实验对比语言模型与人类记忆,发现未经调优的模型记忆优于人类,但通过提示策略和压缩器可使模型遗忘方式更接近人类,从而在下游教育任务中成为更有效的用户模拟器。

详情
AI中文摘要

语言模型越来越多地被部署为用户模拟器,但它们的记忆远比真实用户可靠。为了衡量这一差距,我们在人类和语言模型上进行了一系列来自心理学的经典记忆实验。跨任务我们发现,未经调优的语言模型表现出比人类更好的记忆,即使在被提示模仿人类行为时也是如此。然后我们表明,更好的提示策略和使用压缩器可以使语言模型以更类似人类的方式遗忘内容。使用这些方法,我们初步证明,具有人类类似记忆约束的语言模型可以在下游教育任务中作为更有效的用户模拟器。最后,我们发布人类参考数据和基准,以支持未来关于用语言模型模拟人类记忆的工作。

英文摘要

Language models are increasingly being deployed as user simulators, but their memory is far more reliable than that of real users. To measure this gap, we run a series of classic memory experiments from psychology on both humans and language models. Across tasks, we find that out-of-the-box language models exhibit better memory than humans, even when prompted to imitate human behavior. We then show that better prompting strategies and the use of a compactor can cause language models to forget content in a more human-like way. Using these methods, we show preliminary evidence that language models with human-like memory constraints can function as more effective user simulators in a downstream education task. Finally, we release human reference data and benchmarks to support future work on simulating human memory with language models.

2605.25676 2026-05-26 cs.CL

Llamion Technical Report

Llamion 技术报告

Kisu Yang, Yoonna Jang, Hyeonseok Moon, Hwanseok Jang, Taewoo Lee, Hyungjin Lee, Jeseung Lee, Juhyoung Park, Heuiseok Lim

AI总结 提出 KEPT 方法将 Orion-14B 转换为 Llama 架构的 Llamion 模型,通过参数映射和知识蒸馏在少量数据上恢复性能,并在 KoMMLU 上达到领先水平。

详情
Comments
Research conducted in 2024
AI中文摘要

我们发布了 Llamion,一个 14B 参数的开源语言模型系列,通过将 Orion-14B 转换为标准化的 Llama 家族架构得到。该转换通过高效知识保留转换(KEPT)方法完成,该方法结合了 (i) 用于未改变模块的正常参数映射(NPM),(ii) 优化参数映射(OPM),一种无需训练的 LayerNorm 到 RMSNorm 初始化,我们证明在权重衰减引起的近零均值激活机制下该初始化是最优的,以及 (iii) 跨架构知识蒸馏(XKD),一种等大小的冻结教师蒸馏,将转换后模型的输出与源模型在任何合理输入分布上的输出对齐。Llamion 在单个 A100 上仅用约 1.23 亿 token 和四天时间,在 H6、MT-Bench 和 KoMMLU 上恢复了 Orion 的行为;Llamion-Base 在 KoMMLU 上达到 66.87%,在提交时比 Open Ko LLM Leaderboard 的次优条目高出超过 7.0 个绝对百分点。转移语料库中完全缺失的能力(Python 编程和 20 万 token 上下文处理)在架构转换后完整保留。我们发布了三个检查点(Base、Chat、LongChat),可在 Hugging Face Transformers 库中以 trust_remote_code=False 加载。

英文摘要

We release Llamion, a family of 14B-parameter open-weight language models obtained by transforming Orion-14B into the standardized Llama-family architecture. The transformation is performed by Efficient Knowledge Preservation for Transformation (KEPT), a recipe that combines (i) Normal Parameter Mapping (NPM) for unchanged modules, (ii) Optimized Parameter Mapping (OPM), a training-free LayerNorm-to-RMSNorm initialization we prove optimal under the near-zero-mean activation regime induced by weight decay, and (iii) Cross-architecture Knowledge Distillation (XKD), an equal-size frozen-teacher distillation that aligns the converted model's outputs with the source model's on any reasonable input distribution. Llamion recovers Orion's behaviour on H6, MT-Bench, and KoMMLU with only ~123M tokens on a single A100 in four days; Llamion-Base reaches 66.87% on KoMMLU, exceeding the next-best entry of the Open Ko LLM Leaderboard by >7.0 absolute points at submission time. Capabilities entirely absent from the transfer corpus (Python programming and 200K-token context handling) survive the architectural transition intact. We release three checkpoints (Base, Chat, LongChat) that load with trust_remote_code=False in the Hugging Face Transformers library.

2605.25674 2026-05-26 cs.LG

Stochastic Estimation of the Layer-wise Hessian Trace for Monitoring Neural-network Training

逐层Hessian迹的随机估计用于监测神经网络训练

Maxim Bolshim, Alexander Kugaevskikh

AI总结 提出一种随机估计器,通过Hutchinson迹估计与单次Hessian-向量积结合,在单次反向传播中无偏估计神经网络每层Hessian矩阵对角块的迹,并应用于检测标签记忆化阶段。

详情
Comments
9 pages, 1 table
AI中文摘要

损失及其梯度范数只能微弱地区分神经网络训练的健康和病态阶段,而经验风险的曲率在两者间有质的差异,但在参数数量$P\sim 10^{6}-10^{8}$时无法显式计算。我们提出了一种神经网络经验风险Hessian矩阵对角块迹的随机估计器。该过程将Hutchinson随机迹估计与整个参数向量上的单次Hessian-向量积相结合,并在计算图的单次反向传播中恢复每层迹的无偏估计。我们证明,在权重共享下,正确性要求逐层Hessian在第二次微分之前组装:将共享权重展开为独立坐标会引入系统偏差,其符号和大小由展开Hessian的跨实例块控制。推导了固定Hessian下估计器方差的闭式表达式,以及小批量采样分布下总方差的分解。该分解产生一个临界探测次数$K^{\star}$,平衡了两个随机源,并支持在线监测模式下$K\in[5,10]$的实用建议。该估计器应用于检测ResNet-18、ResNet-34和VGG-11在CIFAR-10和CIFAR-100上的标签记忆化阶段,其中校准的累积和决策规则在虚警率$16/120$下达到了$179/180$的经验检测能力。

英文摘要

The loss and the norm of its gradient separate the healthy and the pathological regimes of neural-network training only weakly, whilst the curvature of the empirical risk differs qualitatively between them but is inaccessible explicitly at parameter counts $P\sim 10^{6}-10^{8}$. We present a stochastic estimator of the trace of the diagonal blocks of the Hessian matrix of the empirical risk of a neural network. The procedure combines the Hutchinson stochastic trace estimator with a single Hessian-vector product over the whole parameter vector and recovers unbiased estimates of every per-layer trace in one backward pass through the computational graph. We show that correctness under weight sharing requires the layer-wise Hessian to be assembled before the second differentiation: unrolling shared weights into independent coordinates introduces a systematic bias whose sign and magnitude are governed by the cross-instance blocks of the unrolled Hessian. A closed-form expression for the variance of the estimator at a fixed Hessian is derived, together with a decomposition of the total variance under the mini-batch sampling distribution. This decomposition yields a critical probe count $K^{\star}$ that balances the two sources of randomness and supports the practical recommendation $K\in[5,10]$ in the on-line monitoring regime. The estimator is applied to the detection of the label-memorisation regime of ResNet-18, ResNet-34, and VGG-11 on CIFAR-10 and CIFAR-100, where a calibrated cumulative-sum decision rule attains an empirical detection power of $179/180$ at a false-alarm rate of $16/120$.

2605.25673 2026-05-26 cs.CR cs.AI

Referential Security as a New Paradigm for AI Evaluations

引用安全性作为AI评估的新范式

Dan Ristea, Vasilios Mavroudis

AI总结 针对AI系统持续更新导致评估标识不稳定问题,提出引用安全性范式,通过将模型身份作为可验证属性来确保评估的可重复性、纵向审计有效性和跨提供商等价性。

详情
AI中文摘要

安全评估本质上依赖于稳定的标识符。任何发现、审计或监管决策必须始终附属于其所涉及的具体工件。持续更新的人工智能系统违反了这一核心假设,公开的模型名称保持不变,而底层权重、提示、检索机制、滥用分类器、推理设置和服务基础设施却未经宣布地修改。因此,当前的评估常常适用于表面标签而非可识别和不同的系统。为了解决这个问题,我们提出引用安全性作为AI评估的新范式。基本安全问题不仅涉及模型是否安全,还涉及后续方能否最终确定特定安全声明所针对的是哪个系统。这种方法将模型身份重新定义为经验上可验证的属性,并将引用稳定性与其所制约的实质性安全声明分开。该框架为当前实践处理不善的三个关键工作流带来了可处理性。具体来说,它实现了可重复评估、纵向审计有效性和跨提供商等价性。通过将这些评估建立在可验证的工件上,我们的方法确保安全审计和监管发现在动态系统的整个操作生命周期中保持其实证效用。

英文摘要

Security evaluations inherently depend on stable identifiers. Any finding, audit, or regulatory decision must remain attached to the specific artifact it pertains to. Continuously updated artificial intelligence systems violate this core assumption, with public model designations remaining static while underlying weights, prompts, retrieval mechanisms, misuse classifiers, inference settings, and serving infrastructures undergo unannounced modifications. Consequently, current evaluations frequently apply to superficial labels rather than identifiable and distinct systems. To resolve this, we propose referential security as a new paradigm for AI evaluation. The fundamental security question extends beyond whether a model is safe to whether subsequent parties can conclusively determine which system a specific safety claim addressed. This approach reframes model identity as an empirically verifiable property and separates referential stability from the substantive security claims it conditions. This framework brings tractability to three critical workflows that current practices handle poorly. Specifically, it enables reproducible evaluation, longitudinal audit validity, and cross-provider equivalence. By grounding these evaluations in verifiable artifacts, our approach ensures that safety audits and regulatory findings maintain their empirical utility across the operational lifecycle of dynamic systems.

2605.25672 2026-05-26 cs.RO

Compliant Non-Prehensile Pushing Manipulation

顺应性非抓取推动操作

Francesco Cufino, Mario Selvaggio, Fabio Amadio, Fabio Ruggiero

AI总结 针对顺应性机器人系统中的非抓取推动操作,提出基于阻抗控制与模型预测控制的框架,通过优化位置/速度设定点实现顺应性推动,并集成能量罐无源性滤波器保证安全交互。

详情
AI中文摘要

在本文中,我们解决了使用顺应性机器人操作系统执行非抓取推动操作的挑战。为了确保在人类环境中安全操作,机器人必须顺从外部物理交互并表现出被动行为。为此,我们扩展了最先进的推动模型,将其与阻抗控制机器人集成。我们开发了一个基于该模型的模型预测控制框架,通过最优调节机器人的位置/速度设定点来实现顺应性推动,同时实现所需的推动力和接触点适应,以获得期望的物体运动。然而,外部交互可能导致跟踪误差,从而引起推动力潜在的无限增加。为了防止这种情况,我们集成了一个能量罐无源性滤波器,进一步调节机器人速度设定点以保证无源性并避免不受控制的能量积累。所提出的方法已在仿真中严格测试,并通过两个不同机器人系统的实验验证,展示了在人机交互过程中的被动顺应性,并评估了轨迹跟踪性能和对物体物理参数变化的鲁棒性。

英文摘要

In this paper, we address the challenge of performing non-prehensile pushing operations with a compliant robotic manipulation system. To ensure safe operations in human-populated environments, robots must comply with external physical interactions and exhibit passive behavior. To achieve this, we extend a state-of-the-art pushing model to integrate it with impedance-controlled robots. We develop a model predictive control framework built upon this model that enables compliant pushing through optimal modulation of the robot's position/velocity set-point, jointly realizing the required pushing force and contact point adaptation to obtain desired object motion. However, external interactions may induce tracking errors, causing a consequent potentially indefinite increase of the pushing force. To prevent this, we integrate an energy tank passivity filter that further modulates the robot velocity set-point to guarantee passivity and avoid uncontrolled energy buildup. The proposed method has been rigorously tested in simulation and validated through experiments on two different robotic systems, demonstrating passive compliance during human-robot interactions and assessing trajectory tracking performance and robustness to variations in the object's physical parameters.

2605.25665 2026-05-26 cs.SE cs.AI

Meta-Engineering Harnesses for AI-Native Software Production: A Contract-Driven Adversarial Verification Architecture with Early Deployment Report

面向AI原生软件生产的元工程框架:一种基于合约的对抗性验证架构及早期部署报告

Satadru Sengupta, Tamunokorite Briggs, Ivan Myshakivskyi

AI总结 提出一种元工程框架,通过合约驱动、角色专业化AI代理和对抗性验证,实现AI原生软件的持续生产、验证与改进,并在小型服务公司的CTO即服务场景中部署17项功能,验证了其可靠性。

详情
Comments
17 pages, 2 figures, early deployment report
AI中文摘要

AI原生软件开发通常在单个模型、提示或生成工件的层面进行评估。这种框架对于生产环境是不够的,在这些环境中,软件必须在多个操作上下文和长时间跨度内持续生产、验证、部署、维护和适应。我们提出了一种元工程框架:一种软件生产架构,它将操作和产品特性需求转化为明确的合约,通过角色专业化的AI代理分配工作,执行独立和对抗性验证,并通过结构化失败分类和外环校准持续自我改进。该框架专为软件交付不是一次性项目而是持续运营功能的场景设计。在我们的激励应用——面向小型服务公司的CTO即服务中,该系统将网站、预订流程、支付系统、后台工作流自动化和AI代理接口作为持续演进的技术基础设施进行管理,而非一次性交付物。我们描述了分层架构,包括两遍合约编译、带有专业化记录的持久化Markdown记忆、基于注意力和独立性的验证、四路失败仲裁器以及外环校准。我们报告了早期生产部署的结果,该部署跨越数周,涵盖17项功能,包括一个详细的应用内支付案例研究,揭示了合约不完整性和验证边界问题。这些观察直接推动了框架的针对性改进。贡献在于实现了一个可测量、可扩展的验证架构,使AI原生服务即软件生产变得可靠、可审计且可随时间改进。

英文摘要

AI-native software development is often evaluated at the level of individual models, prompts, or generated artifacts. This framing is insufficient for production environments where software must be continuously produced, verified, deployed, maintained, and adapted across many operational contexts and long time horizons. We present a meta-engineering harness: a software-production architecture that transforms operational and product feature requirements into explicit contracts, routes work through role-specialized AI agents, performs independent and adversarial verification, and continuously improves itself through structured failure classification and outer-loop calibration. The harness is designed for settings in which software delivery is not a one-time project but an ongoing operating function. In our motivating application, CTO-as-a-service for small service firms, the system manages websites, booking flows, payment systems, backoffice workflow automations, and AI-agent interfaces as continuously evolving technical infrastructure rather than one-off deliverables. We describe the layered architecture, including two-pass contract compilation, persistent markdown memory with specialization records, attention-based and independence-based verifications, a four-way failure arbiter, and outer-loop calibration. We report results from an early production deployment spanning 17 features over several weeks, including a detailed in-app payments case study that revealed contract incompleteness and verification-boundary issues. These observations directly drove targeted improvements to the harness. The contribution is an implemented, measurable, and extensible verification architecture for making AI-native service-as-a-software production reliable, auditable, and improvable over time.

2605.25664 2026-05-26 cs.HC cs.AI cs.AR cs.CY

Posture Clip: Sit properly or I wont let you work

Posture Clip:坐姿端正,否则不让你工作

Arka Majhi, Aparajita Mondal

AI总结 提出一种名为PostureClip的衣夹式设备,通过屏幕变黑和恢复来限制用户弯腰工作,实验表明其能显著改善坐姿角度并减少弯腰时长。

详情
Journal ref
Wearable Technologies, 7, e5 (2026)
Comments
Published online by Cambridge University Press on 14 May 2026
AI中文摘要

不良姿势因其对健康和生产率的有害影响而成为一个重要问题。本文提出了一种名为PostureClip的衣夹式设备,旨在通过黑屏并在纠正姿势后恢复屏幕,限制用户以弯腰角度坐着工作,从而促进更好的姿势。该设备集成了传感器和反馈机制,为用户提供实时姿势反馈。为了评估PostureClip的有效性,进行了一项对照实验,参与者(n=165)每天使用笔记本电脑/个人电脑工作超过6小时。参与者被随机分配到干预组(IG1,n=54;IG2,n=55),使用衣夹式设备,以及对照组(CG,n=56),不使用该设备。IG1未收到反馈,而IG2通过通知并进一步使屏幕变暗从设备获得反馈。研究在参与者的办公室环境中进行,持续4周,收集了姿势角度、弯腰持续时间以及用户反馈等指标。分析显示,与无反馈组和对照组(未干预)相比,使用带反馈的PostureClip的参与者组在姿势角度上有显著改善(p<0.001),弯腰持续时间显著减少(p<0.01)。用户反馈的定性分析强调了该设备的易用性、提供及时反馈的有效性以及对参与者姿势意识和习惯的积极影响。这些结果表明,PostureClip是促进久坐工作中更好姿势的有效工具。

英文摘要

Poor posture is a significant concern due to its detrimental effects on health and productivity. This paper presents a collar-clipped device called PostureClip, designed to restrict users from sitting and working at a bent angle, by blacking out the screen and resuming on correcting posture, thereby promoting better posture. The device integrates sensors and feedback mechanisms to provide real-time posture feedback to users. To evaluate the effectiveness of PostureClip, a controlled experiment was conducted with participants (n=165) who were working on a laptop/PC for over 6 hours per day. The participants were randomly assigned to both the intervention group (IG1,n=54 ; IG2,n=55), which used the collar-clipped device, and the control group (CG, n=56), which did not use the device. IG1 didn't get feedback while IG2 got feedback from the device by notifying and further darkening the screen. The study was conducted in the office environment of the participants, for 4 weeks, and metrics such as posture angle, duration of bent angle, and user feedback were collected. Analysis revealed significant improvements in posture angle (p<0.001) and significant reduction in bent angle duration (p<0.01) for participants' group using PostureClip with feedback and compared to the group without feedback and the control group (who were not intervened). The qualitative analysis of user feedback highlighted the device's ease of use, effectiveness in providing timely feedback, and positive impact on participants' awareness and habits regarding posture. These results indicate that PostureClip is an effective tool for promoting better posture during sedentary work.

2605.25663 2026-05-26 cs.LG cs.CV

Opportunistic Target Selection: Early Directional Commitment for Query-Efficient Black-Box Adversarial Attacks

机会目标选择:面向查询高效黑盒对抗攻击的早期定向承诺

Florent Tariolle, Florian Yger

AI总结 提出一种轻量级方法OTS,通过早期将无目标攻击切换为有目标攻击,锁定当前领先的非真实类,从而减少查询次数并提高成功率。

详情
Comments
13 pages, 10 figures, 3 tables; code available at https://github.com/Tariolle/opportunistic-target-selection
AI中文摘要

仅最小化真实置信度的黑盒对抗攻击存在类别漂移问题:扰动在特征空间中游荡而不承诺特定对抗类别,浪费查询在分散、无方向的进展上。我们引入机会目标选择(OTS),一种轻量级包装器,在攻击轨迹早期将无目标攻击切换为有目标目标,锁定当前领先的非真实类别。OTS不需要对底层攻击进行架构修改,不需要梯度访问,也不需要先验的目标类别知识。我们在五个标准ImageNet分类器(4500次运行)上对三种基于分数的攻击(SimBA、使用交叉熵损失的Square Attack和Bandits)验证了OTS。在随机搜索攻击上,OTS紧密跟踪oracle性能,在ResNet-50上成功率提升高达27个百分点,审查均值迭代次数相对减少43%。在梯度估计攻击(Bandits)和边际损失攻击上,OTS是冗余的,这一负面结果强化了我们将OTS解释为边际损失替代的观点。在对抗训练模型上,双峰难度分布消除了目标帮助的机制。

英文摘要

Black-box adversarial attacks that minimize only the ground-truth confidence suffer from class drift: perturbations wander through the feature space without committing to a specific adversarial class, wasting queries on diffuse, undirected progress. We introduce Opportunistic Target Selection (OTS), a lightweight wrapper that switches an untargeted attack to a targeted objective early in its trajectory, locking onto whichever non-true class currently leads. OTS requires no architectural modification to the underlying attack, no gradient access, and no a priori target-class knowledge. We validate OTS on three score-based attacks (SimBA, Square Attack with cross-entropy loss, and Bandits) across five standard ImageNet classifiers (4,500 runs). On random-search attacks, OTS closely tracks oracle performance, with gains up to +27 pp in success rate and 43% relative reduction in censored-mean iterations on ResNet-50. On gradient-estimation attacks (Bandits) and attacks with margin loss, OTS is redundant, a negative result that reinforces our interpretation of OTS as a margin-loss surrogate. On adversarially-trained models, a bimodal difficulty distribution eliminates the regime where targeting helps.

2605.25662 2026-05-26 cs.LG

Closed-Form Node Classification with Exact Graph Unlearning

具有精确图遗忘的闭式节点分类

Aditya Gaur, Charu Sharma

AI总结 提出一种基于调整同配性的路由闭式框架,通过闭式求解器(SGC+Ridge回归或LCF-Net)匹配或超越图神经网络性能,并实现精确图遗忘的快速更新与隐私分析。

详情
Comments
19 pages, 5 figures, 12 tables (7 main + 5 appendix)
AI中文摘要

用于节点分类的图神经网络通常通过梯度下降训练数百或数千个epoch。最近的工作表明,当适当调整时,经典的GCN/SAGE/GAT架构可以在许多节点分类基准上匹配图变换器。我们提出一个互补的问题:通过确定性闭式求解器能恢复多少性能,以及这能提供什么保证? 我们引入了一个由调整同配性选择的路由闭式框架。对于同配图,我们使用SGC风格的传播后接Ridge回归;对于异配图,我们引入LCF-Net,一种逐层闭式图特征精炼网络,其每层Ridge求解由高斯核-Ridge头部限制。在14个基准上,包括ogbn-arxiv和ogbn-proteins,我们的闭式预测器在9个测量数据集中的9个上匹配或击败了最佳普通2层GCN/SAGE/GAT,在12个小基准中的9个上在1个标准差内与调优的深度配方持平,并在两个大图上超过了OGB排行榜的普通GCN。剩余的异配差距紧密跟踪从普通2层到深度SAGE的增益,表明残差差异主要是架构性的。 由于我们的预测器是确定性线性系统的显式解,修改后的图输入可以重新求解以获得重训练等效参数。我们形式化了标签、特征、边、节点和子图修改的精确图对象遗忘,证明了Ridge组件的K跳局部性,并在109个配置上验证了精确性。在ogbn-arxiv上,局部更新比完全重新求解快21-45倍,比梯度重训练快约10^6倍。结构反演实验进一步量化了精确重训练的隐私下限和近似图遗忘方法的额外泄漏。

英文摘要

Graph neural networks for node classification are typically trained by gradient descent over hundreds or thousands of epochs. Recent work has shown that, when properly tuned, classic GCN/SAGE/GAT architectures can match graph transformers on many node-classification benchmarks. We ask a complementary question: how much of this performance can be recovered by deterministic closed-form solvers, and what guarantees does this enable? We introduce a routed closed-form framework selected by adjusted homophily. For assortative graphs, we use SGC-style propagation followed by Ridge regression; for heterophilous graphs, we introduce LCF-Net, a layer-wise closed-form graph feature-refinement network whose per-layer Ridge solves are capped by a Gaussian kernel-Ridge head. Across 14 benchmarks, including ogbn-arxiv and ogbn-proteins, our closed-form predictors match or beat the best vanilla 2-layer GCN/SAGE/GAT on 9 of 9 measured datasets, tie tuned deep recipes within one standard deviation on 9 of 12 small benchmarks, and exceed the OGB-leaderboard plain GCN on both large graphs. The remaining heterophilous gap closely tracks the gain from vanilla 2-layer to deep SAGE, suggesting that the residual difference is primarily architectural. Because our predictors are explicit solutions of deterministic linear systems, modified graph inputs can be re-solved to obtain retrain-equivalent parameters. We formalize exact graph-object unlearning for label, feature, edge, node, and subgraph modifications, prove K-hop locality for Ridge components, and verify exactness across 109 configurations. On ogbn-arxiv, localized updates give $21$--$45\times$ speedups over full re-solving and roughly $10^{6}\times$ speedups over gradient retraining. Structural-inversion experiments further quantify the privacy floor of exact retraining and the additional leakage of approximate graph-unlearning methods.

2605.25661 2026-05-26 cs.CV

DRM: Diffusion-based Reward Model With Step-wise Guidance

DRM: 基于扩散的奖励模型与逐步引导

Jaxon Zhang, Binxin Yang, Hubery Yin, Chen Li, Jing Lyu

AI总结 提出基于扩散的奖励模型(DRM),利用预训练扩散模型作为评估骨干,通过逐步评估能力改进强化学习对齐和推理采样,提升图像生成质量。

详情
AI中文摘要

当前主流将扩散模型与人类偏好对齐的方法通常采用基于VLM的奖励模型。然而,这些为语义对齐预训练的奖励模型难以捕捉关键的感知质量,如美学、构图和视觉和谐。在这项工作中,我们认为一个能够高保真生成的模型必须对这些视觉属性有深刻理解。基于这一见解,我们引入了基于扩散的奖励模型(DRM),这是一种新颖的范式,使用预训练的扩散模型作为强大的评估骨干。DRM的一个关键优势是其独特的能力,不仅可以评估最终图像,还可以评估生成过程中任何阶段的噪声中间潜变量。我们以两种方式利用这种逐步评估能力。首先,我们提出了逐步GRPO,一种强化学习算法,提供密集的每步奖励,以解决GRPO算法中不精确的信用分配问题,从而实现更稳定和有效的对齐。其次,我们引入了逐步采样,一种新颖的推理策略,使用DRM作为动态引导,在每一步评估多个生成路径,引导过程朝向更高质量的结果。大量实验证实,我们的方法显著提升了生成图像的最终质量。代码:https://github.com/jjaxonx/DRM。

英文摘要

Current mainstream methods of aligning diffusion models with human preferences typically employ VLM-based reward models. However, these reward models, pre-trained for semantic alignment, struggle to capture the essential perceptual qualities-such as aesthetics, composition, and visual harmony. In this work, we argue that a model capable of high-fidelity generation must possess a profound understanding of these visual attributes. Based on this insight, we introduce the Diffusion-based Reward Model (DRM), a novel paradigm that use the pre-trained diffusion model as a powerful evaluative backbone. A key advantage of the DRM is its unique ability to assess not only the final image but also the noisy intermediate latents at any stage of the generative process. We leverage this step-wise evaluative capacity in two ways. First, we propose Step-wise GRPO, a reinforcement learning algorithm that provides dense, per-step rewards to resolve the imprecise credit assignment problem in GRPO algorithm, leading to more stable and effective alignment. Second, we introduce Step-wise Sampling, a novel inference strategy that employs the DRM as a dynamic guide to evaluate multiple generation paths at each step, steering the process towards higher-quality outcomes. Extensive experiments confirm that our approach significantly enhances the final quality of generated images. Code: https://github.com/jjaxonx/DRM.

2605.25659 2026-05-26 cs.CV

StreamChar: Long-Horizon Streaming Character Audio-Video Generation with Decoupled Orchestration

StreamChar: 基于解耦编排的长时程流式角色音频-视频生成

Linrui Tian, Qi Wang, Bang Zhang

AI总结 提出StreamChar流式框架,通过LLM编排器与联合音频-视频DiT解耦长时程编排与短窗去噪,实现实时、稳定、高质量的角色动画生成。

详情
AI中文摘要

实时流式联合音频-视频生成用于角色动画需要生成器说出请求的文本、跨块保持视觉身份并在严格的播放预算内运行。这些要求难以同时满足:逐块自回归生成会累积文本-音频错位和视觉漂移,而低延迟所需的少步蒸馏通常会降低空间多样性和时间质量。我们提出StreamChar,一种将长时程编排与短窗音频-视频去噪分离的流式框架。基于LLM的编排器使用文本和历史上下文生成帧对齐的音频条件,联合音频-视频DiT在参考和运动帧条件下执行局部双向去噪。为高效部署,我们使用两阶段蒸馏流程,首先压缩采样器,然后在在线块展开下微调学生模型。进度感知指针在展开训练期间将部分文本与生成的音频对齐,而汇块记忆提供持久视觉锚点以减少长时程漂移。在短片段和长时程协议上的实验表明,StreamChar在单个H100 GPU上实时运行,与最近的联合和音频驱动基线相比,在文本保真度、音视频同步、视觉质量和流式稳定性方面提供了有利的系统级权衡。

英文摘要

Real-time streaming joint audio-video generation for character animation requires a generator to speak the requested transcript, maintain visual identity across chunks, and run within a strict playback budget. These requirements are difficult to satisfy simultaneously: chunk-wise autoregressive generation can accumulate transcript-audio misalignment and visual drift, while the few-step distillation needed for low latency often degrades spatial diversity and temporal quality. We present StreamChar, a streaming framework that separates long-horizon orchestration from short-window audio-video denoising. An LLM-based orchestrator uses the transcript and historical context to produce frame-aligned audio conditions, and a joint audio-video DiT performs local bidirectional denoising with reference and motion-frame conditioning. For efficient deployment, we use a two-stage distillation pipeline that first compresses the sampler and then fine-tunes the student under online chunk rollouts. A progress-aware pointer aligns partial transcripts with generated audio during rollout training, and a sink-chunk memory provides a persistent visual anchor for reducing long-horizon drift. Experiments on short-clip and long-horizon protocols show that StreamChar runs in real time on a single H100 GPU and provides a favorable system-level trade-off among transcript fidelity, audio-visual synchronization, visual quality, and streaming stability compared with recent joint and audio-driven baselines.

2605.25658 2026-05-26 cs.CL cs.AI

AutoSG: LLM-Driven Solver Generation Solely from Task Prompts for Expensive Optimization

AutoSG: 仅从任务提示出发的LLM驱动的昂贵优化求解器生成

Haoran Gu, Handing Wang, Yi Mei, Mengjie Zhang

AI总结 提出AutoSG框架,通过检索增强生成、单步自优化和无实例评估机制,从自然语言提示直接生成可执行定制求解器,解决昂贵优化中的幻觉、结构破坏和评估成本问题。

详情
AI中文摘要

昂贵优化任务在现实应用中普遍存在,需要高度专业化的求解器。虽然LLM驱动的自动求解器生成显示出前景,但当前范式在处理昂贵优化时面临三个关键问题:由于领域知识不足导致的事实幻觉、在细化过程中频繁破坏先前建立的局部最优结构,以及在训练实例上执行带来的高昂评估成本和受限的泛化能力。为了解决这些问题,我们引入了AutoSG,一个完全自动化的流程,直接将自然语言提示转换为可执行的定制求解器。AutoSG具有三个核心创新:一个检索增强的求解器生成模块,严格将代码基于经过验证的文献;一个单步自优化算子,在保留关键结构组件的同时引入特定任务的改进;以及一个基于Elo的无实例LLM-as-a-Judge评估机制,快速建立全局排名。在多种昂贵优化任务上的广泛评估证实,AutoSG显著优于人工设计的最先进框架和现有的LLM生成的求解器。

英文摘要

Expensive optimization tasks are ubiquitous in real-world applications, demanding highly specialized solvers. While LLM-driven automated solver generation shows promise, current paradigms face three critical issues when tackling expensive optimization: factual hallucinations due to deficient domain knowledge, the frequent dismantling of previously established locally optimal structures during refinement, and the prohibitive evaluation costs alongside restricted generalization caused by executing on training instances. To address these issues, we introduce AutoSG, a fully automated workflow directly translating natural language prompts into executable customized solvers. AutoSG features three core innovations: a retrieval-augmented solver generation module strictly grounding code in verified literature; a one-step self-refinement operator introducing task-specific improvements while preserving critical structural components; and an instance-free Elo-based LLM-as-a-Judge evaluation mechanism rapidly establishing global rankings. Extensive evaluations across diverse expensive optimization tasks confirm AutoSG significantly outperforms human-designed state-of-the-art frameworks and existing LLM-generated solvers.

2605.25657 2026-05-26 cs.CV

ARMA-C3: A Contrastive ARMA Convolutional Framework for Unsupervised and Semi-supervised Classification

ARMA-C3: 一种用于无监督和半监督分类的对比ARMA卷积框架

VSS Tejaswi Abburi, Saurabh J. Shigwan, Nitin Kumar

AI总结 提出ARMA-C3框架,利用对比学习和图割正则化在无监督和半监督场景下学习图节点的判别性表示,在多个医学影像数据集上表现优异。

详情
AI中文摘要

在生物医学和神经退行性疾病中,由于标记数据的稀缺和成像模式的复杂性,准确和早期疾病识别仍然具有挑战性。为了解决这些问题,我们引入了ARMA-C3,一个统一的无监督和半监督图学习框架,用于基于对比学习和图割正则化的节点分类,以学习结构上有意义且具有判别性的表示。通过将样本或图像建模为图节点并利用样本间关系,所提出的框架捕获了传统机器学习方法通常忽略的受试者级别依赖关系。我们在五个临床相关数据集上进行了广泛的二分类实验:阿尔茨海默病神经影像学倡议(ADNI)、额颞叶痴呆神经影像学(NIFD)数据集以及三个医学影像基准(BreastMNIST、PneumoniaMNIST和一个肝脏超声数据集)。实验结果表明,ARMA-C3在多个评估设置中,特别是在有限监督和严重类别不平衡下,与经典聚类技术、最先进的机器学习模型以及现有的基于图的深度学习方法相比,取得了具有竞争力且通常更优越的性能。所提出的框架进一步展示了在多样化生物医学成像模态中的鲁棒表示学习和强跨模态泛化能力。

英文摘要

In biomedical and neurodegenerative disorders, accurate and early disease identification remains challenging due to the scarcity of labeled data and the complexity of imaging patterns. To address these challenges, we introduce ARMA-C3, a unified unsupervised and semi-supervised graph learning framework for node classification based on contrastive learning and graph-cut regularization to learn structurally meaningful and discriminative representations. By modeling samples or images as graph nodes and exploiting inter-sample relationships, the proposed framework captures subject-level dependencies that conventional machine learning methods typically overlook. We conduct extensive binary classification experiments across five clinically relevant datasets: the Alzheimer's Disease Neuroimaging Initiative (ADNI), the Neuroimaging in Frontotemporal Dementia (NIFD) dataset, and three medical imaging benchmarks (BreastMNIST, PneumoniaMNIST, and a liver ultrasound dataset). Experimental results demonstrate that ARMA-C3 achieves competitive and frequently superior performance compared to classical clustering techniques, state-of-the-art machine learning models, and existing graph-based deep learning approaches across multiple evaluation settings, particularly under limited supervision and severe class imbalance. The proposed framework further demonstrates robust representation learning and strong cross-modal generalization across diverse biomedical imaging modalities.

2605.25656 2026-05-26 cs.CV

Event-based Batting Impact Estimation

基于事件的击球冲击估计

Ryotaro Ishida, Wataru Ikeda, Ryosei Hara, Akemi Kobayashi, Toshitaka Kimura, Mariko Isogawa

AI总结 提出利用事件相机的高时间分辨率和高动态范围,通过检测球与球棒的加权质心距离来估计击球冲击时刻,并引入掩膜细化网络解决事件帧与RGB图像之间的域差异,在低光和严重遮挡条件下将平均绝对误差降低约63%。

详情
Comments
Accepted to IEEE International Conference on Image Processing (ICIP) 2026. (c) 2026 IEEE. Personal use of this material is permitted
AI中文摘要

精确估计击球冲击时刻对于理解快速感觉运动控制至关重要。然而,由于时间分辨率不足和运动模糊,RGB相机难以完成此任务。同样,惯性测量单元(IMU)由于传感器侵入性和有限的时间精度,在实际比赛中不实用。为克服这些限制,我们提出了一种新颖框架,利用事件相机(具有微秒级分辨率和高动态范围)基于检测到的球与球棒之间的加权质心距离来估计冲击时刻。为解决事件帧与RGB图像之间的域差异(这会降低分割精度),我们生成高密度事件帧。然后,我们引入一个掩膜细化网络,利用这些帧和双向掩膜信息,并通过一种新颖的损失函数进行优化。在真实数据集上的实验表明,我们的方法在具有挑战性的条件下(包括低光环境和严重遮挡)实现了卓越的准确性,将平均绝对误差降低了约63%,优于基线方法。

英文摘要

Estimating the precise timing of batting impact is crucial for understanding the rapid sensorimotor control. However, this task is challenging for RGB cameras due to insufficient temporal resolution and motion blur. Similarly, Inertial Measurement Units (IMUs) are impractical for actual matches due to sensor intrusiveness and their limited temporal precision. To overcome these limitations, we propose a novel framework leveraging event-based cameras, which offer microsecond resolution and high dynamic range, to estimate impact timing based on the weighted centroid distance between the detected ball and bat. To address the domain gap between event frames and RGB images that degrades segmentation accuracy, we generate high-density event frames. We then introduce a mask refinement network that leverages these frames and bidirectional mask information, optimized using a novel loss function. Experiments on real-world datasets demonstrate that our method achieves superior accuracy under challenging conditions, including low-light environments and severe occlusions, outperforming baselines by reducing the Mean Absolute Error by approximately 63%.