URL PDF HTML ☆

赞 0 踩 0

2605.18113 2026-05-28 cs.CL

iPOE: Interpretable Prompt Optimization via Explanations

iPOE: 基于解释的可解释提示优化

Jiahui Li, Yarik Menchaca Resendiz, Sean Papay, Roman Klinger

发表机构 * Fundamentals of Natural Language Processing, University of Bamberg, Germany（自然语言处理基础，巴姆堡大学，德国）； Leibniz-Institut für Psychologie (ZPID), Trier, Germany（莱比锡心理学研究所（ZPID），特里尔，德国）

AI总结提出iPOE方法，通过自动从解释中生成指南并优化，实现可解释的提示优化，在四个数据集上性能提升高达39%，且人类与LLM对指南贡献的判断一致性达Cohen's kappa 0.65。

详情

AI中文摘要

提示优化通常被构建为一个离散搜索问题，旨在为LLM找到高性能且鲁棒的指令。然而，搜索结果可能无法透明地显示为什么以及在哪里特定的提示更改带来了性能提升。这与人类接受注释任务指导的方式形成对比。在人类任务中，研究人员精心设计注释指南，从而提高注释一致性。本文旨在结合这两种方法，并引入iPOE，一种通过解释进行可解释提示优化的新策略。我们通过自动从注释决策的解释（自动生成或来自人类）中创建指南来指导提示优化过程。此外，通过一系列操作（包括删除、添加、打乱和合并）来优化这组指南。最终的提示包含指导注释的指南，使LLM的决策过程和优化过程透明化。因此，它也为提示优化领域的非专业人士提供支持，特别是在需要专业知识的挑战性领域。在四个数据集上的实验中，我们发现iPOE相比评估基线最高可提升39%，并且LLM的解释可以替代所提出方法中的人类解释。此外，我们的可解释性验证研究表明，人类和LLM在哪些指南有助于其注释方面可以基本达成一致，Cohen's kappa得分高达0.65。

英文摘要

Prompt optimization has often been framed as a discrete search problem to find high-performing and robust instructions for an LLM. However, the search result might not make it transparent why and where specific prompt changes lead to performance gains. This is in contrast to how humans are instructed for annotation tasks. Here, researchers carefully design annotation guidelines, leading to enhanced annotation consistency. Our paper aims at joining these two approaches and introduces iPOE, a novel interpretable prompt optimization strategy via explanations. We guide the prompt optimization process by automatically created guidelines from explanations of annotation decisions (either automatically generated or from humans). This set of guidelines is furthermore optimized by as series of operations, including removing, adding, shuffling, and merging. The resulting prompt includes guidelines that instruct the annotation, making the decision process of the LLM and the optimization transparent. It therefore supports also laypeople in the area of prompt optimization, particularly in challenging domains requiring expertise. In our experiments on four datasets, we find that iPOE can improves over the evaluated baselines by up to 39% and LLM explanations can replace human explanations in the proposed method. Moreover, our interpretability validation study demonstrates that humans and LLMs can substantially agree on which guidelines contribute to their annotations, achieving a Cohen's kappa score of up to 0.65.

URL PDF HTML ☆

赞 0 踩 0

2605.17929 2026-05-28 cs.RO

TacSE3: Equivariant SE(3) Motion Estimation from Low-Texture Visuotactile Images for In-Gripper Tracking and Compensation

TacSE3: 基于低纹理视触觉图像的等变SE(3)运动估计用于夹爪内跟踪与补偿

Zhongyuan Liao, Junzhe Wang, Qingyang Liu, Zhenmin Huang, Jun Ma, Yi Cai, Fei Meng, Haobo Liang, Michael Yu Wang

发表机构 * Hong Kong Center for Construction Robotics, The Hong Kong University of Science and Technology, Hong Kong, China（香港建设机器人中心，香港科技大学，中国香港）； The Hong Kong University of Science and Technology (Guangzhou), China（香港科技大学（广州），中国）； School of Engineering, Great Bay University, Dongguan, China（工程学院，大湾大学，中国东莞）

AI总结提出TacSE3，一种将低纹理视触觉观测转化为解耦三维力场并估计SE(3)增量刚体运动的触觉运动估计流程，通过双传感器感知减少平移-旋转歧义，实现夹爪内跟踪与补偿。

详情

AI中文摘要

机器人手内操作需要在频繁视觉遮挡下可靠地跟踪物体运动，然而低纹理视触觉图像为传统的图像或几何匹配方法提供的稳定对应点很少。本文提出TacSE3，一种触觉运动估计流程，将低纹理视触觉观测转化为解耦的三维力场，并估计SE(3)上的增量刚体运动。该方法从接触质心运动推导平面平移，并主要从剪切相关的触觉响应估计旋转，从而为夹爪内跟踪和补偿提供物理可解释的信号。使用成对DM-Tac指尖传感器的实验表明，双传感器感知减少了平移-旋转歧义，支持跨轴和物体几何形状的旋转跟踪，并提供轻量级补偿信号，在不重新训练基础策略的情况下提高了下游操作任务中的扰动容忍度。

英文摘要

Robotic in-hand manipulation requires reliable object-motion tracking under frequent visual occlusion, yet low-texture visuotactile images provide few stable correspondences for conventional image- or geometry-matching methods. This paper presents TacSE3, a tactile motion-estimation pipeline that converts low-texture visuotactile observations into a decoupled three-dimensional force field and estimates incremental rigid-body motion on SE(3). The method derives planar translation from contact-centroid motion and estimates rotation primarily from shear-related tactile responses, yielding a physically interpretable signal for in-gripper tracking and compensation. Experiments with paired DM-Tac fingertip sensors show that dual-sensor sensing reduces translation-rotation ambiguity, supports rotation tracking across axes and object geometries, and provides a lightweight compensation signal that improves disturbance tolerance in downstream manipulation tasks without retraining the base policy.

URL PDF HTML ☆

赞 0 踩 0

2605.17842 2026-05-28 cs.LG

SNLP: Layer-Parallel Inference via Structured Newton Corrections

SNLP：通过结构化牛顿校正的层并行推理

Ligong Han, Kai Xu, Hao Wang, Akash Srivastava

发表机构 * Core AI, IBM（IBM核心AI）

AI总结提出结构化牛顿层并行（SNLP）框架，通过将Transformer层间依赖视为非线性残差方程并用结构化牛顿校正并行求解，实现推理加速，在0.5B模型上获得高达2.58倍加速。

Comments Project webpage: https://github.com/phymhan/nanochat-snlp

详情

AI中文摘要

自回归语言模型顺序执行Transformer层，造成传统张量或流水线并行无法消除的延迟瓶颈。我们研究是否可以通过将跨层的隐藏状态轨迹视为非线性残差方程的解，并用并行牛顿风格更新来求解，从而放松这种逐层依赖。虽然这一观点在原理上是合理的，但精确的牛顿校正需要昂贵的雅可比向量积，而朴素的固定点迭代在训练好的Transformer上不稳定。我们引入了结构化牛顿层并行（SNLP），一个训练和推理框架，用廉价的架构诱导替代动力学替换精确的层雅可比。在残差Transformer中，这产生了恒等牛顿（IDN），其中校正简化为前缀和式更新；在mHC风格架构中，HC牛顿（HCN）使用模型的残差混合矩阵。我们还研究了SNLP感知训练，包括预训练正则化和直接SNLP前向SFT。在Nanochat规模的Transformer上的实验表明，SNLP揭示了一个实用的速度-质量边界：在0.5B模型上，它实现了高达2.58倍的时钟加速，而一个较不激进的配置在不增加PPL的情况下实现了1.40倍加速。这种有用的权衡来自于IDN/HCN引入的有偏有限迭代计算，而不是精确恢复顺序轨迹。我们进一步表明，SNLP前向SFT可以保持下游任务准确性，并且SNLP可以作为自推测解码的草稿模型，而顺序验证器保持输出正确性。

英文摘要

Autoregressive language models execute Transformer layers sequentially, creating a latency bottleneck that is not removed by conventional tensor or pipeline parallelism. We study whether this layerwise dependency can be relaxed by treating the hidden-state trace across layers as the solution of a nonlinear residual equation and solving it with parallel Newton-style updates. While this view is principled, exact Newton corrections require expensive Jacobian-vector products and naive fixed-point iterations are unstable on trained Transformers. We introduce Structured Newton Layer Parallelism (SNLP), a training and inference framework that replaces exact layer Jacobians with cheap architecture-induced surrogate dynamics. In residual Transformers, this yields Identity Newton (IDN), where the correction reduces to a prefix-sum-like update; in mHC-style architectures, HC Newton (HCN) uses the model's residual mixing matrix. We also study SNLP-aware training, including pretraining regularization and direct SNLP-forward SFT. Experiments on Nanochat-scale Transformers show that SNLP exposes a practical speed-quality frontier: on 0.5B models, it reaches up to 2.58x wall-clock speedup, and a less aggressive configuration reaches 1.40x speedup without increasing PPL. The useful tradeoff comes from the biased finite-iteration computation induced by IDN/HCN rather than exact recovery of the sequential trace. We further show that SNLP-forward SFT can preserve downstream task accuracy, and that SNLP can serve as a drafter for self-speculative decoding while a sequential verifier preserves output correctness.

URL PDF HTML ☆

赞 0 踩 0

2605.02503 2026-05-28 cs.AI

GFMate：通过测试时提示调优赋能图基础模型

Yan Jiang, Ruihong Qiu, Zi Huang

发表机构 * School of Electrical Engineering and Computer Science, The University of Queensland, Brisbane, Queensland, Australia（电气工程与计算机科学学院，昆士兰大学，布里斯班，昆士兰，澳大利亚）

AI总结提出预训练无关的测试时图提示调优方法GFMate，通过质心提示和层提示避免与源领域和预训练策略的纠缠，并设计测试时互补学习目标利用有标签和无标签目标域数据，在12个基准数据集上实现高达30.63%的性能提升。

详情

AI中文摘要

图提示调优通过在传统单域场景中引入可训练提示来增强模型性能，在图学习中展现出巨大潜力。最近的研究通过少样本调优辅助提示，将图提示扩展至改进图基础模型（GFM）。尽管取得了进展，现有方法大多将源域信息嵌入提示中，这些提示要么作为GFM的输入，要么在模型预训练期间编码。这种提示与特定源域和GFM预训练策略的纠缠限制了其向其他域和不同GFM的泛化能力。此外，现有的GFM提示仅依赖少样本调优进行适应，忽略了未标记目标域测试数据中的丰富信息。受这些洞察启发，本文旨在通过预训练无关的测试时图提示调优赋能GFM，命名为GFMate。GFMate引入在目标域上预训练后应用的质心提示和层提示，避免与特定源域和模型预训练的纠缠。此外，设计了一个测试时互补学习目标，以利用有标签和未标记的目标域数据进行有效的测试时提示调优。在12个基准数据集上的大量实验证明了GFMate的优越性能和效率，实现了高达30.63%的提升。代码可在https://github.com/YanJiangJerry/GFMate获取。

英文摘要

Graph prompt tuning has shown great potential in graph learning by introducing trainable prompts to enhance the model performance in conventional single-domain scenarios. Recent research has extended graph prompts to improve Graph Foundation Models (GFMs) by few-shot tuning auxiliary prompts. Despite their progress, most existing methods embed source-domain information into prompts, which serve either as input to GFMs or encoded during model pre-training. Such prompt entanglement with specific source domains and GFM pre-training strategy restricts their generalisability to other domains and different GFMs. Furthermore, existing GFM prompts merely rely on few-shot tuning for adaptation, neglecting the rich information in unlabelled target domain test data. Motivated by these insights, this paper aims to empower GFMs with pre-training-agnostic test-time graph prompt tuning, named GFMate. GFMate introduces centroid and layer prompts applied after pre-training on target domains, avoiding entanglement with specific source domains and model pre-training. In addition, a test-time complementary learning objective is devised to exploit both labelled and unlabelled target domain data for effective test-time prompt tuning. Extensive experiments on 12 benchmark datasets demonstrate the superior performance and efficiency of GFMate, achieving improvements of up to 30.63%. Code is available at https://github.com/YanJiangJerry/GFMate.

URL PDF HTML ☆

赞 0 踩 0

2605.14284 2026-05-28 cs.LG

Smooth Multi-Policy Causal Effect Estimation in Longitudinal Settings

纵向设置下的平滑多策略因果效应估计

Wenxin Chen, Weishen Pan, Kyra Gan, Fei Wang

发表机构 * cornell（康奈尔大学）

AI总结针对多个动态治疗策略的因果效应估计，提出一种策略感知的迭代条件期望重参数化方法（PEQ-Net），通过共享表示实现联合估计，并利用核均值嵌入训练策略编码器，以降低有限样本方差。

详情

AI中文摘要

多个动态治疗策略的比较评估对于医疗和政策决策至关重要，然而传统的纵向因果推断方法孤立地估计每个策略，阻止了反事实之间的信息共享。我们证明这种单独估计范式会引入结构上不受控制的二阶偏差，即使在经过纵向目标最大似然估计（LTMLE）的标准去偏后，也会膨胀有限样本方差。为了解决这个问题，我们提出了一种策略感知的迭代条件期望（ICE）Q函数重参数化方法，通过共享表示实现联合估计。我们在策略编码Q网络（PEQ-Net）中实现了这种方法，该网络以共享策略编码器为核心。编码器使用核均值嵌入进行训练，确保学习到的表示空间反映总体层面的策略差异。在应用LTMLE校正步骤后，我们证明这种设计对二阶余项施加了结构约束，从而稳定了有限样本方差。在半合成数据集上的实验表明，PEQ-Net始终优于现有的基于ICE的方法，特别是在评估紧密相关的策略时，均方根误差显著降低。

英文摘要

Comparative evaluation of multiple dynamic treatment policies is essential for healthcare and policy decisions, yet conventional longitudinal causal inference methods estimate each in isolation, preventing information sharing across counterfactuals. We demonstrate that this separate estimation paradigm induces a structurally uncontrolled second-order bias, inflating finite-sample variance even after standard debiasing with longitudinal targeted maximum likelihood estimation(LTMLE). To address this, we propose a policy-aware reparameterization of Iterative Conditional Expectation (ICE) Q-functions that enables joint estimation through shared representations. We implement this approach in the Policy-Encoded Q Network (PEQ-Net), an architecture centered on a shared policy encoder. The encoder is trained using kernel mean embeddings, ensuring that the learned representation space reflects population-level policy dissimilarities. After applying an LTMLE correction step, we prove this design imposes a structural constraint on the second-order remainder, thereby stabilizing finite-sample variance. Experiments on semi-synthetic datasets demonstrate that PEQ-Net consistently outperforms existing ICE-based methods, achieving substantial reductions in root-mean-square error, particularly when evaluating closely related policies.

URL PDF HTML ☆

赞 0 踩 0

2601.16312 2026-05-28 cs.CL cs.AI

Teaching and Evaluating LLMs to Reason About Polymer Design Related Tasks

教授和评估LLMs推理聚合物设计相关任务

Dikshya Mohanty, Mohammad Saqib Hasan, Syed Mostofa Monsur, Size Zheng, Benjamin Hsiao, Niranjan Balasubramanian

发表机构 * Stony Brook University（石溪大学）

AI总结本文提出PolyBench基准数据集和知识增强推理蒸馏方法，使中小型语言模型在聚合物设计任务上性能接近前沿闭源LLM。

详情

AI中文摘要

AI4Science研究在许多科学应用中显示出前景，包括聚合物设计。然而，当前的LLMs在此问题空间中效果不佳，因为：(i)大多数模型缺乏聚合物特定知识，(ii)现有对齐模型对聚合物设计相关知识和能力的覆盖有限。为解决此问题，我们引入了PolyBench，一个包含超过125K聚合物设计相关任务的大规模训练和测试基准数据集，利用从实验和合成数据源获得的超过1300万数据点的知识库，以确保聚合物及其属性的广泛覆盖。为了使用PolyBench进行有效对齐，我们引入了一种知识增强推理蒸馏方法，用结构化CoT增强该数据集。此外，PolyBench中的任务从简单到复杂的分析推理问题组织，使得能够进行泛化测试和问题空间中的诊断探测。实验表明，在PolyBench上训练的具有7B到32B参数的中小型语言模型(SLMs)在PolyBench测试数据集上优于类似大小的模型，并与闭源前沿LLMs保持竞争力，同时在外部聚合物基准上展示了性能提升。数据集和相关代码可在https://github.com/StonyBrookNLP/PolyBench获取。

英文摘要

Research in AI4Science has shown promise in many science applications, including polymer design. However, current LLMs are ineffective in this problem space because: (i) most models lack polymer-specific knowledge, and (ii) existing aligned models have limited coverage of knowledge and capabilities relevant to polymer design. Addressing this, we introduce PolyBench, a large-scale training and test benchmark dataset of more than 125K polymer design-related tasks, leveraging a knowledge base of more than 13 million data points obtained from experimental and synthetic data sources to ensure broad coverage of polymers and their properties. For effective alignment using PolyBench, we introduce a knowledge-augmented reasoning distillation method that augments this dataset with structured CoT. Furthermore, tasks in PolyBench are organized from simple to complex analytical reasoning problems, enabling generalization tests and diagnostic probes across the problem space. Experiments show that small- and mid- sized language models (SLMs) with 7B to 32BB parameters, trained on PolyBench, outperform similar-sized models and remain competitive with closed-source frontier LLMs on PolyBench's test dataset, while demonstrating performance gains on external polymer benchmarks. Dataset and associated code available at https://github.com/StonyBrookNLP/PolyBench.

URL PDF HTML ☆

赞 0 踩 0

2605.13743 2026-05-28 cs.LG

GHGbench: A Unified Multi-Entity, Multi-Task Benchmark for Carbon Emission Prediction

GHGbench：一个统一的多实体、多任务碳排放预测基准

Yifan Duan, Siyuan Zheng, Lihuan Li, Chao Xue, Flora Salim

发表机构 * School of Computer Science and Engineering（计算机科学与工程学院）

AI总结提出GHGbench，一个包含公司和建筑层面温室气体排放预测的统一开放数据集与基准，通过多模态数据融合和标准化评估揭示结构化难度与分布外泛化差距。

详情

AI中文摘要

实体级碳排放预测的开放数据集和基准在访问、规模、粒度和评估方面仍然分散。我们引入了GHGbench，一个用于公司和建筑层面温室气体预测的开放数据集和基准。公司轨道包含来自12,000多家公司的32,000多条公司年记录，包含范围1+2和范围3披露以及财务/行业信号；建筑轨道将来自13个开放源的491,591条建筑年记录统一为涵盖26个大都市区域（10个美国、15个澳大利亚、1个新加坡）的单一模式，包含气候协变量和多模态遥感嵌入。GHGbench定义了规范的数据划分，以分布内和跨区域/城市迁移为主要任务，以时间保持和短期预测为补充附录证据；主要基线涵盖梯度提升树、表格基础模型、MLP、FT-Transformer和多模态融合，辅以LLM面板，所有方法均通过多种子配对自助法评估。出现了三个基准级别的发现：（i）建筑排放的结构性难度高于公司排放；（ii）分布内到分布外的差距远远超过两个轨道中任何模型内的差距，并且据我们所知，表格基础模型是第一个在多城市建筑排放任务上通过配对自助法显著优于调优树的基线；（iii）多模态遥感嵌入在表格泛化失效的地方恰好有帮助。GHGbench还揭示了灾难性的城市迁移和部门因子查找上限作为系统性失败模式。代码和重建配方可在GHGbench获取。

英文摘要

Open datasets and benchmarks for entity-level carbon-emission prediction remain fragmented across access, scale, granularity, and evaluation. We introduce GHGbench, an open dataset and benchmark for company- and building-level greenhouse-gas prediction. The company track contains 32,000+ company-year records from 12,000+ firms with Scope 1+2 and Scope 3 disclosures and financial/sectoral signals; the building track harmonises 491,591 building-year records from 13 open sources into a single schema across 26 metropolitan areas (10 U.S., 15 Australian, 1 Singaporean), with climate covariates and multimodal remote-sensing embeddings. GHGbench defines canonical splits with in-distribution and cross-region/city transfer as primary tasks and temporal hold-out plus short-horizon forecasting as supplementary appendix evidence; headline baselines span gradient-boosted trees, a tabular foundation model, MLP, FT-Transformer, and multimodal fusion, with an LLM panel as auxiliary, all evaluated under multi-seed paired-bootstrap tests. Three benchmark-level findings emerge: (i) building emissions are structurally harder than company emissions; (ii) the in-distribution to out-of-distribution gap dwarfs any within-model gap across both the company track and the building track, and a tabular foundation model is, to our knowledge, the first baseline to open a paired-bootstrap-significant gap over tuned trees on a multi-city building-emissions task; (iii) multimodal remote-sensing embeddings help precisely where tabular generalisation breaks. GHGbench also exposes catastrophic city transfer and the sector-factor lookup ceiling as systematic failure modes. Code and reconstruction recipes are available at GHGbench.

URL PDF HTML ☆

赞 0 踩 0

2605.13517 2026-05-28 cs.CV cs.AI cs.LG

ArcVQ-VAE: A Spherical Vector Quantization Framework with ArcCosine Additive Margin

ArcVQ-VAE：一种带有反余弦加性边界的球面向量量化框架

Jaeyung Kim, YoungJoon Yoo

发表机构 * Department of Artificial Intelligence, Chung-Ang University, Seoul, Republic of Korea（韩国首尔 Chung-Ang 大学人工智能系）； SNUAILAB, Seoul, Republic of Korea（韩国首尔 SNUAILAB 实验室）

AI总结针对VQ-VAE有限码本容量限制表示能力的问题，提出ArcVQ-VAE框架，通过引入球面角边先验（包括球界范数正则化和反余弦加性边界损失）增强潜在表示的判别性和均匀分散性，提升码本利用率，在图像重建和生成任务上取得竞争性能。

Comments To appear in Proceedings of the 43rd International Conference on Machine Learning (ICML 2026)

详情

AI中文摘要

向量量化变分自编码器（VQ-VAE）已成为图像建模中学习离散表示的基本框架。然而，VQ-VAE模型必须使用有限的码本向量集对整张图像进行分词，这种容量限制限制了其捕获丰富多样表示的能力。在本文中，我们提出反余弦加性边界VQ-VAE（ArcVQ-VAE），一种新颖的向量量化框架，该框架为传统VQ-VAE的码本引入了球面角边先验（SAMP）。所提出的SAMP由球界范数正则化（将所有码本向量约束在时间相关的欧几里得球内）和反余弦加性边界损失（鼓励潜在向量之间更大的角度可分性）组成。这种公式在受限空间内促进了更具判别性和均匀分散的潜在表示，从而提高了有效的潜在空间覆盖范围，并导致码本利用率提升。在标准图像重建和生成任务上的实验结果表明，ArcVQ-VAE在重建精度、表示多样性和样本质量方面与基线模型相比取得了竞争性能。代码可在 https://github.com/goals4292/ArcVQ-VAE 获取。

英文摘要

Vector Quantized Variational Autoencoder (VQ-VAE) has become a fundamental framework for learning discrete representations in image modeling. However, VQ-VAE models must tokenize entire images using a finite set of codebook vectors, and this capacity limitation restricts their ability to capture rich and diverse representations. In this paper, we propose ArcCosine Additive Margin VQ-VAE (ArcVQ-VAE), a novel vector quantization framework that introduces a spherical angular-margin prior (SAMP) for the codebook of a conventional VQ-VAE. The proposed SAMP consists of Ball-Bounded Norm Regularization, which constrains all codebook vectors within a time-dependent Euclidean ball, and ArcCosine Additive Margin Loss, which encourages greater angular separability among latent vectors. This formulation promotes more discriminative and uniformly dispersed latent representations within the constrained space, thereby improving effective latent-space coverage and leading to improved codebook utilization. Experimental results on standard image reconstruction and generation tasks show that ArcVQ-VAE achieves competitive performance against baseline models in terms of reconstruction accuracy, representation diversity, and sample quality. The code is available at: https://github.com/goals4292/ArcVQ-VAE

URL PDF HTML ☆

赞 0 踩 0

2506.22726 2026-05-28 cs.CV cs.LG

XTransfer: Modality-Agnostic Few-Shot Model Transfer for Human Sensing at the Edge

XTransfer: 面向边缘人体感知的模态无关小样本模型迁移

Yu Zhang, Xi Zhang, Hualin Zhou, Xinyuan Chen, Shang Gao, Hong Jia, Jianfei Yang, Yuankai Qi, Tao Gu

发表机构 * Macquarie University, Sydney, NSW, Australia（麦考瑞大学，悉尼，新南威尔士州，澳大利亚）； Nanyang Technological University, Singapore（南洋理工大学，新加坡）； The University of Auckland, Auckland, New Zealand（奥克兰大学，奥克兰，新西兰）

AI总结提出XTransfer方法，通过模型修复和层重组实现模态无关的小样本模型迁移，降低传感器数据收集、模型训练和边缘部署成本。

Comments Accepted at ICML2026

Journal ref Proceedings of the 43rd International Conference on Machine Learning (ICML 2026), Seoul, South Korea, 6-11 July 2026

详情

AI中文摘要

边缘系统上用于人体感知的深度学习具有巨大的智能应用潜力。然而，其训练和开发受到传感器数据有限和边缘系统资源约束的限制。虽然将预训练模型迁移到不同的感知应用很有前景，但现有方法通常需要大量的传感器数据和计算资源，导致成本高且可迁移性有限。在本文中，我们提出了XTransfer，这是一种首创的方法，实现了模态无关、小样本模型迁移，并具有资源高效的设计。XTransfer通过以下方式灵活地使用预训练模型并在不同模态间迁移知识：(i) 模型修复，通过仅使用少量传感器数据适配预训练层来安全地缓解模态偏移；(ii) 层重组，以逐层方式高效地搜索和重组源模型中的感兴趣层以重构模型。我们在跨不同模态的多种人体感知数据集上对各种基线进行了基准测试。结果表明，XTransfer实现了最先进的性能，同时显著降低了传感器数据收集、模型训练和边缘部署的成本。

英文摘要

Deep learning for human sensing on edge systems presents significant potential for smart applications. However, its training and development are hindered by the limited availability of sensor data and resource constraints of edge systems. While transferring pre-trained models to different sensing applications is promising, existing methods often require extensive sensor data and computational resources, resulting in high costs and limited transferability. In this paper, we propose XTransfer, a first-of-its-kind method enabling modality-agnostic, few-shot model transfer with resource-efficient design. XTransfer flexibly uses pre-trained models and transfers knowledge across different modalities by (i) model repairing that safely mitigates modality shift by adapting pre-trained layers with only few sensor data, and (ii) layer recombining that efficiently searches and recombines layers of interest from source models in a layer-wise manner to restructure models. We benchmark various baselines across diverse human sensing datasets spanning different modalities. The results show that XTransfer achieves state-of-the-art performance while significantly reducing the costs of sensor data collection, model training, and edge deployment.

URL PDF HTML ☆

赞 0 踩 0

2605.12929 2026-05-28 cs.CV cs.AI

Anatomy-Slot: Unsupervised Anatomical Factorization for Homologous Bilateral Reasoning in Retinal Diagnosis

Anatomy-Slot: 用于视网膜诊断中同源双侧推理的无监督解剖分解

Yingzhe Ma, Xiao Yang, Yuguo Yin, Zheyu Wang

发表机构 * University of Electronic Science and Technology of China（电子科技大学）； Peking University（北京大学）

AI总结提出Anatomy-Slot方法，通过无监督解剖瓶颈分解斑块令牌为结构一致的解剖区域槽，并利用双向交叉注意力对齐双眼槽，在ODIR-5K上相比ViT-L基线提升AUC 4.2点，验证了显式结构对应改善诊断的假设。

Comments 15 pages, 3 figures

详情

AI中文摘要

视网膜诊断本质上是双侧的：临床医生比较双眼的同源结构（例如，视盘不对称），然而大多数深度模型基于单眼表示。我们研究显式结构对应是否改善诊断，并提出Anatomy-Slot来操作化这一假设。Anatomy-Slot通过将斑块令牌分解为一组涌现的、结构一致的槽（对应于解剖区域）来引入无监督解剖瓶颈，然后通过双向交叉注意力对齐双眼的槽。在ODIR-5K上使用$n=10$个种子，该方法相比匹配的ViT-L基线在AUC上提升$4.2$个点（95%置信区间；Wilcoxon符号秩检验，$W=0$，$p=0.002$）。配对破坏和高斯噪声下的压力测试提供了对应依赖性和鲁棒性的受控测试。我们进一步在REFUGE上报告了定量视盘定位和交叉注意力定位分析。除了报告的性能提升外，这些结果表明，以对象为中心的解剖对应为与临床双侧比较一致的可解释诊断系统提供了一条原则性路径。

英文摘要

Retinal diagnosis is inherently bilateral: clinicians compare homologous structures across eyes (e.g., optic disc asymmetry), yet most deep models operate on monocular representations. We investigate whether explicit structural correspondence improves diagnosis, and propose Anatomy-Slot to operationalize this hypothesis. Anatomy-Slot introduces an unsupervised anatomical bottleneck by decomposing patch tokens into a set of emergent, structurally-coherent slots that correspond to anatomical regions, then aligning these slots across eyes via bidirectional cross-attention. On ODIR-5K with $n=10$ seeds, the method improves AUC by $4.2$ points over a matched ViT-L baseline (95% CIs; Wilcoxon signed-rank test, $W=0$, $p=0.002$). Pairing disruption and stress testing under Gaussian noise provide controlled tests of correspondence dependence and robustness under corruption. We further report quantitative optic disc grounding on REFUGE and cross-attention localization analysis. Beyond the reported gains, these results indicate that object-centric anatomical correspondence offers a principled path toward interpretable diagnostic systems aligned with clinical bilateral comparison.

URL PDF HTML ☆

赞 0 踩 0

2604.04295 2026-05-28 cs.CL

Adaptive Cost-Efficient Evaluation for Reliable Patent Claim Generation

面向可靠专利权利要求生成的适应性成本高效评估

Yongmin Yoo, Qiongkai Xu, Longbing Cao

发表机构 * Frontier AI Research Centre, Macquarie University School of Computing, FSE, Macquarie University（前沿人工智能研究中心，麦考瑞大学计算机学院，FSE，麦考瑞大学）

AI总结提出两阶段框架ACE，利用专利错误类别结构进行不确定性感知路由，第一阶段编码器预测错误类型熵，超过阈值则交由第二阶段专家LLM执行模式约束的专利思维链协议，在降低78%成本的同时超越70B参数LLM基线。

详情

AI中文摘要

自动化专利权利要求验证要求低容错率。然而，现有方法面临僵化-资源困境：轻量级编码器无法追踪长程法律依赖，而穷举式LLM验证在百万权利要求规模下会产生4-5倍的开销。基于置信度的简单级联无法解决这一问题，因为二元有效性分数无法区分需要不同推理深度的结构上不同的错误类型。我们提出一个两阶段框架：适应性成本高效评估（ACE），它利用专利错误的类别结构进行不确定性感知路由。在第一阶段，微调后的编码器将权利要求投影到法律错误类型上的K+1分布，其预测熵作为路由信号。超过熵阈值的权利要求被升级到第二阶段，由专家LLM执行模式约束的专利思维链（CoPT）协议，将权利要求元素映射到35 U.S.C.标准，其模式约束将每个权利要求的延迟降低42%，同时产生法律依据充分的裁决。我们进一步提出了一个包含40,000个权利要求的数据集ACE-40k，带有MPEP注释，其中ACE超越了包括监督式70B参数LLM在内的竞争基线，同时将成本降低78%。在真实的USPTO驳回数据上，路由机制无需重新校准即可迁移，推理时间减少60%，同时保持竞争性的召回率。

英文摘要

Automated patent claim validation demands low error tolerance. However, existing approaches face a rigidity-resource dilemma: lightweight encoders cannot track long-range legal dependencies, while exhaustive LLM verification incurs 4-5X higher overhead at million-claim scale. A naive confidence-based cascade cannot resolve this because binary validity scores fail to distinguish structurally distinct error types which require different reasoning depths. We propose a two-stage framework: Adaptive Cost-efficient Evaluation (ACE), which exploits the categorical structure of patent errors for uncertainty-aware routing. In the first stage, a fine-tuned encoder projects claims into a K+1 distribution over legal error types, whose predictive entropy serves as the routing signal. Claims exceeding an entropy threshold are escalated to the second stage, where an expert LLM executes a schema-constrained Chain-of-Patent-Thought (CoPT) protocol to map claim elements against 35 U.S.C. standards whose schema constraint reduces per-claim latency by 42% while producing legally grounded verdicts. We further present a 40,000-claim dataset ACE-40k with MPEP-grounded annotations, where ACE surpasses competitive baselines including a supervised 70B-parameter LLM while reducing costs by 78%. On real USPTO rejection data, the routing mechanism transfers without re-calibration, reducing inference time by 60% while maintaining competitive recall.

URL PDF HTML ☆

赞 0 踩 0

2512.21075 2026-05-28 cs.LG cs.AI math.PR stat.ML

最优 LTLf 综合

Yujian Cao, Sven Schewe, Qiyi Tang, Shufang Zhu

发表机构 * University of Liverpool（利物浦大学）

AI总结本文提出最优 LTLf 综合，通过最大化可保证实现的目标数量，解决多目标规范无法全部实现时的策略综合问题，并实验验证了方法的可行性。

详情

AI中文摘要

策略综合通常遵循全有或全无的范式，当规范在不确定环境中无法保证时返回不可实现。在本文中，我们引入了最优 LTLf 综合，其目标是从由多个目标组成的给定规范中尽可能多地实现目标，特别是当它们不能全部联合实现时。我们首先考虑最大保证综合，它承诺一个我们可以先验保证实现的最大目标集。然后，我们引入最大观察综合，它最大化后验实现的目标，这些目标在不同执行中可能不可比较。最后，我们提出增量最大观察综合，通过在执行过程中出现更强保证的机会时进一步改进策略。实验结果表明，最优综合的不同变体扩展性大致相当，在给定的超时时间内解决了大部分基准实例，证明了该方法的实际可行性。

英文摘要

Strategy synthesis typically follows an all-or-nothing paradigm, returning unrealisable whenever a specification cannot be guaranteed in an uncertain environment. In this paper, we introduce optimal LTLf synthesis, where the goal is to realise as many objectives as possible from a given specification consisting of multiple objectives, especially for the case that they are not all jointly realisable. We first consider max-guarantee synthesis, which commits to a maximal set of objectives that we can a priori guarantee to realise. We then introduce max-observation synthesis, which maximises a posteriori realised objectives that may be incomparable on different executions. Finally, we present incremental max-observation synthesis, which further improves strategies by exploiting opportunities for stronger guarantees when they arise during an execution. Experimental results show that different variations of optimal synthesis scale broadly equally well, solving a large fraction of the benchmark instances within the given timeout, demonstrating the practical feasibility of the approach.

URL PDF HTML ☆

赞 0 踩 0

2605.10583 2026-05-28 cs.CV

FrequencyCT: Frequency Domain Self-supervised Low-dose CT Denoising

FrequencyCT：频域自监督低剂量CT去噪

Guoquan Wei, Liu Shi, Chong Chen, Qiegen Liu

发表机构 * School of Information Engineering, Nanchang University（南昌大学信息工程学院）； SKLMS, ICMSEC, Academy of Mathematics and Systems Science, Chinese Academy of Sciences（中国科学院数学与系统科学研究院SKLMS、ICMSEC）

AI总结提出FrequencyCT，一种在频域中利用噪声与真实信号分布差异生成伪样本的零样本自监督方法，用于低剂量CT去噪，并通过数据截断稳定优化，实验验证了其临床潜力。

详情

AI中文摘要

尽管对计算机断层扫描（CT）去噪进行了广泛研究，但很少有研究利用投影域数据特性来减轻噪声相关性。为填补这一空白，本文提出FrequencyCT，这是第一种在频域中为低剂量CT去噪生成伪样本的零样本自监督方法。具体而言，通过利用噪声和真实信号在频域分布上的差异，提出了一种区域低频锚定技术。对高频区域应用相位保持噪声和掩膜扰动，生成用于自监督的伪样本。基于含噪投影的噪声方差与底层真实信号之间的指数相关性，对生成的样本进行一致的数据截断，以稳定优化梯度。在多个公开和真实数据集上的评估结果证实了本研究的临床应用潜力，为去噪领域提供了创新视角。代码可在 https://github.com/yqx7150/FrequencyCT 获取。

英文摘要

Despite extensive research on computed tomography (CT) denoising, few studies exploit projection-domain data characteristics to mitigate noise correlation. To bridge this gap, this work proposes FrequencyCT, the first zero-shot self-supervised method for pseudo-sample generation in the frequency domain for low-dose CT denoising. Specifically, by exploiting the distinct frequency-domain distributions of noise and true signal, a regional low-frequency anchoring technique is proposed. Applying phase-preserving noise and mask perturbations to the high-frequency region generates pseudo-samples for self-supervision. Driven by the exponential correlation between noise variance of noisy projections and the underlying true signal, consistent data truncation is applied to the generated samples to stabilize optimization gradients. Evaluation results on multiple public and real datasets confirm the clinical application potential of this research, which provides an innovative perspective for the field of denoising. The code is available at: https://github.com/yqx7150/FrequencyCT.

URL PDF HTML ☆

赞 0 踩 0

2605.10581 2026-05-28 cs.CV

异构依赖图引导的专利表示学习注意力机制

Yongmin Yoo, Qiongkai Xu, Zhangkai Wu, Longbing Cao

发表机构 * Frontier AI Research Centre, Macquarie University School of Computing, FSE, Macquarie University（前沿人工智能研究中心，麦考瑞大学计算机学院，FSE，麦考瑞大学）

AI总结针对专利权利要求间的依赖层次被忽略的问题，提出专利异构注意力图编码器（PHAGE），通过构建类型图区分法律引用与技术关系，并引入可学习偏置的连通性掩码将权利要求级拓扑投射到令牌级注意力，结合双粒度对比学习，在分类、检索和聚类任务上超越领域自适应和引用感知基线。

详情

AI中文摘要

预训练语言模型通过将权利要求编码为扁平令牌序列来推进专利分类和检索，但忽略了权利要求之间的依赖层次。将层次结构融入自注意力面临两个挑战。首先，权利要求依赖涉及不同可靠性的关系类型：不加区分地对待它们会使有噪声的技术关系污染更清洁的法律引用信号。其次，当依赖图在权利要求级别定义时，Transformer模型会失败，因为它们在令牌级别操作；广播权利要求级别的邻接可能会稀释跨无关令牌对的结构信息。一种新颖的专利异构注意力图编码器（PHAGE）解决了这些挑战。为了处理异构依赖，PHAGE构建了一个类型图，将法律引用与技术关系区分为不同的边类型。为了弥合层次差距，PHAGE引入了一个带有可学习关系感知偏置的连通性掩码，将权利要求级别的拓扑投射到令牌级别的注意力中。PHAGE学习一个双粒度对比目标，以将表示与专利间分类法和专利内拓扑对齐。实验表明，PHAGE在专利分类、检索和聚类上优于领域自适应和引用感知基线。PHAGE揭示，专利内权利要求拓扑比专利间结构捕获了更强的归纳偏置。

英文摘要

Pre-trained language models advance patent classification and retrieval via encoding claims as flat token sequences, yet overlooking the dependency hierarchy among claims. Incorporating the hierarchy into self-attention poses two challenges. First, claim dependencies involve relation types with varying reliability: treating them indiscriminately allows noisy technical relations to corrupt cleaner legal citation signals. Second, when the dependency graph is defined over claims, Transformer models fail as they operate at the token level; broadcasting claim-level adjacency can dilute structural information across unrelated token pairs. A novel Patent Heterogeneous Attention Graph Encoder (PHAGE) addresses these challenges. To handle heterogeneous dependencies, PHAGE constructs a typed graph to separate legal citations from technical relations as distinct edge types. To bridge the hierarchy gap, PHAGE introduces a connectivity mask with learnable relation-aware biases to project a claim-level topology into token-level attention. PHAGE learns a dual-granularity contrastive objective to align representations with inter-patent taxonomy and intra-patent topology. Experiments show that PHAGE outperforms domain-adapted and citation-aware baselines on patent classification, retrieval, and clustering. PHAGE discloses that the intra-patent claim topology captures stronger inductive bias than the inter-patent structure.

URL PDF HTML ☆

赞 0 踩 0

2508.11011 2026-05-28 cs.CV

Are Large Pre-trained Vision Language Models Effective Construction Safety Inspectors?

大型预训练视觉语言模型能否成为有效的施工安全检查员？

Xuezheng Chen, Zhengbo Zou

发表机构 * Mechanical Engineering（机械工程）

AI总结本文提出ConstructionSite 10k数据集，包含1万张施工图像及三项任务标注，评估大型预训练视觉语言模型在零样本和小样本下的泛化能力，为施工安全检查提供基准。

详情

DOI: 10.1017/dce.2026.10044

AI中文摘要

施工安全检查通常涉及人类检查员在现场识别安全问题。随着强大的视觉语言模型（VLM）的兴起，研究人员正在探索将其用于从现场图像中检测安全违规等任务。然而，目前缺乏公开数据集来全面评估和进一步微调VLM在施工安全检查中的应用。当前VLM的应用使用小型监督数据集，限制了它们在未直接训练的任务中的适用性。在本文中，我们提出了ConstructionSite 10k数据集，包含10,000张施工场地图像，并为三个相互关联的任务提供标注，包括图像描述、安全违规视觉问答（VQA）和施工元素视觉定位。随后我们对当前最先进的大型预训练VLM的评估显示，它们在零样本和小样本设置下具有显著的泛化能力，但需要额外训练才能应用于实际施工场地。该数据集允许研究人员使用新的架构和技术训练和评估自己的VLM，为施工安全检查提供了有价值的基准。

英文摘要

Construction safety inspections typically involve a human inspector identifying safety concerns on-site. With the rise of powerful Vision Language Models (VLMs), researchers are exploring their use for tasks such as detecting safety rule violations from on-site images. However, there is a lack of open datasets to comprehensively evaluate and further fine-tune VLMs in construction safety inspection. Current applications of VLMs use small, supervised datasets, limiting their applicability in tasks they are not directly trained for. In this paper, we propose the ConstructionSite 10k, featuring 10,000 construction site images with annotations for three inter-connected tasks, including image captioning, safety rule violation visual question answering (VQA), and construction element visual grounding. Our subsequent evaluation of current state-of-the-art large pre-trained VLMs shows notable generalization abilities in zero-shot and few-shot settings, while additional training is needed to make them applicable to actual construction sites. This dataset allows researchers to train and evaluate their own VLMs with new architectures and techniques, providing a valuable benchmark for construction safety inspection.

URL PDF HTML ☆

赞 0 踩 0

AI 大模型

视觉与机器人

科学与医疗

Detecting and Mitigating the Correct-Answer Extinction Window in Test-Time Reinforcement Learning with Majority Voting

Semantic-Enriched Latent Visual Reasoning

Democratizing Large-Scale Re-Optimization with LLM-Guided Model Patches

Understanding Self-Supervised Learning via Latent Distribution Matching

iPOE: Interpretable Prompt Optimization via Explanations

TacSE3: Equivariant SE(3) Motion Estimation from Low-Texture Visuotactile Images for In-Gripper Tracking and Compensation

SNLP: Layer-Parallel Inference via Structured Newton Corrections

DataClawBench: An Agent Benchmark for Exploratory Real-World Financial Data Analysis

Mind Dreamer: Untethering Imagination via Active Causal Intervention on Latent Manifolds

Are VLMs Seeing or Just Saying? Uncovering the Illusion of Visual Re-examination

Self-Prompting Diffusion Transformer for Open-Vocabulary Scene Text Editing via In-Context Learning

GQLA: Group-Query Latent Attention for Hardware-Adaptive Large Language Model Decoding

GFMate: Empowering Graph Foundation Models with Test-time Prompt Tuning

Smooth Multi-Policy Causal Effect Estimation in Longitudinal Settings

Teaching and Evaluating LLMs to Reason About Polymer Design Related Tasks

GHGbench: A Unified Multi-Entity, Multi-Task Benchmark for Carbon Emission Prediction

ArcVQ-VAE: A Spherical Vector Quantization Framework with ArcCosine Additive Margin

XTransfer: Modality-Agnostic Few-Shot Model Transfer for Human Sensing at the Edge

Anatomy-Slot: Unsupervised Anatomical Factorization for Homologous Bilateral Reasoning in Retinal Diagnosis

Adaptive Cost-Efficient Evaluation for Reliable Patent Claim Generation

Feature Learning Dynamics in Infinite-Depth Neural Networks

Mitigating Cross-Lingual Cultural Inconsistencies in LLMs via Consensus-Driven Preference Optimisation

One-Step Generative Modeling via Wasserstein Gradient Flows

Optimal LTLf Synthesis

FrequencyCT: Frequency Domain Self-supervised Low-dose CT Denoising

Polygon-mamba: Retinal vessel segmentation using polygon scanning mamba and space-frequency collaborative attention

Verifiable Process Rewards for Agentic Reasoning

Informative Path Planning with Guaranteed Estimation Uncertainty

Heterogeneous Dependency Graph-Guided Attentionfor Patent Representation Learning

Are Large Pre-trained Vision Language Models Effective Construction Safety Inspectors?