arXivDaily arXiv每日学术速递 周一至周五更新
重置
2606.13102 2026-06-12 cs.RO 新提交

FTP-1: A Generalist Foundation Tactile Policy Across Tactile Sensors for Contact-Rich Manipulation

FTP-1:一种跨触觉传感器的通用基础触觉策略,用于密集接触操作

Chengbo Yuan, Zicheng Zhang, Mingjie Zhou, Wendi Chen, Yi Wang, Zhuoyang Liu, Dantong Niu, Shuo Wang, Hui Zhang, Wenkang Zhang, Yingdong Hu, Yuanqing Gong, Wanli Xing, Chuan Wen, Cewu Lu, Kaifeng Zhang, Yang Gao

发表机构 * Tsinghua University(清华大学) Shanghai Qi Zhi Institute(上海期智研究院) Sharpa Shanghai Jiao Tong University(上海交通大学) University of California, Berkeley(加州大学伯克利分校) ETH Zurich(苏黎世联邦理工学院) Fudan University(复旦大学) Shanghai Innovation Institute(上海创新研究院)

AI总结 提出FTP-1,首个通用基础触觉策略,通过异构编码器和共享Transformer专家,跨21种传感器和3000小时数据预训练,实现触觉操作技能的跨传感器迁移,在未见传感器上成功率提升31%。

详情
AI中文摘要

尽管基于视觉的通用机器人策略取得了成功,但现有的基于触觉的策略仍然局限于固定的具身和传感器设置。这是因为触觉信号在不同硬件之间高度异构,使得跨传感器泛化变得困难。我们提出了FTP-1,这是第一个通用基础触觉策略,预训练以获取跨不同传感器和具身的可迁移触觉操作能力。FTP-1支持多种触觉输入,包括基于图像、阵列和状态的信号,通过使用异构编码器将它们投影到统一的形态感知潜在标记中,并由共享的触觉Transformer专家联合建模。FTP-1在来自26个数据源的约3000小时触觉操作数据上进行预训练,涵盖21种传感器的人类和机器人演示,学习到的触觉技能可以迁移到预训练期间未见过的传感器上。在涵盖5种硬件配置的下游微调实验中,FTP-1在见过的传感器设置上将密集接触操作的成功率提高了17.2%,并且令人惊讶地,迁移到两种先前未见过的触觉传感器设置上,成功率提高了31%。FTP-1为触觉操作建立了第一个统一的基础基线,为未来的触觉策略提供了共享的模型级起点。预训练模型、数据集、训练代码及更多可视化内容请访问此网址。

英文摘要

Despite the success of vision-based generalist robotic policies, existing tactile-based policies remain tied to fixed embodiments and sensor setups. This is because tactile signals are highly heterogeneous across hardware, making cross-sensor generalization difficult. We present FTP-1,the first generalist foundation tactile policy pretrained to acquire transferable tactile manipulation abilities across diverse sensors and embodiments. FTP-1 supports varied tactile inputs, including image-, array-, and state-based signals, by using heterogeneous encoders to project them into unified morphology-aware latent tokens that are jointly modeled by a shared tactile Transformer expert. Pretrained on around 3,000 hours of tactile manipulation data aggregated from 26 data sources, spanning human and robot demonstrations across 21 sensors, FTP-1 learns tactile skills that transfer beyond the sensors seen during pretraining. Across downstream finetuning experiments spanning 5 hardware configurations, FTP-1 improves contact-rich manipulation on seen sensor setups by +17.2% and, surprisingly, transfers to two previously unseen tactile-sensor setups, achieving a +31% gain in success rate. FTP-1 establishes the first unified foundation baseline for tactile manipulation, providing future tactile policies with a shared model-level starting point. Pretrained models, datasets, training code and more visualization at this https URL.

2606.13100 2026-06-12 cs.CL 新提交

LEDGER: A Long-Context Benchmark of Corporate Annual Reports for Grounded Financial Retrieval and Extraction

LEDGER:基于公司年报的长上下文基准,用于基于事实的金融检索与提取

Charles Moslonka, Amaury de Vitry, Arthur Garnier, Hicham Randrianarivo, Emmanuel Malherbe

发表机构 * Artefact Research Center(Artefact 研究中心) MICS, CentraleSupélec, Université Paris-Saclay(巴黎萨克雷大学中央理工高等电力学院 MICS 实验室) Ardian

AI总结 提出LEDGER基准,包含4,999份数字化公司年报,用于评估大语言模型在长上下文金融任务中的表现,涵盖KPI检索、单值查找和全量提取任务。

详情
Comments
5 pages, 1 figure
AI中文摘要

财务报告是大语言模型天然的试验场,而近期各种规模模型的长上下文能力使得在该领域进行严格评估的需求日益迫切。然而,大多数公开的金融资源将任务简化为纯文本的SEC 10-K文件,并配以少量问答项。我们发布了LEDGER(基于事实的提取与检索的长上下文文档评估),一个包含4,999份数字化公司年报的语料库——这些是包含图表、表格和叙述的完整文档,而不仅仅是监管文件。每份报告标注了31个合并的财务KPI,这些KPI需要被提取并与财报发布日的市场反应相关联。基于这些数据,我们推导出三个覆盖难度范围的评估基准:一个纯页面级别的KPI检索任务,包含118,048个自然语言问题及其TREC风格的相关性判断;一个对话式的“大海捞针”单值查找任务;以及一个完整的KPI提取任务,均基于长且数字密集的报告。此外,我们还提供了人工OCR质量标注(含标注者间一致性)、完整的提取、验证和评分工具链。我们进一步通过一个案例研究展示了该数据集的研究实用性,该案例将CEO信函修辞与发布后的市场影响联系起来。

英文摘要

Finance reporting is a natural proving ground for large language models, and the very-long-context capabilities of recent models across all sizes make rigorous evaluation in this domain an increasingly pressing need. Yet most public financial resources reduce the task to plain-text SEC 10-K filings paired with a handful of question-answer items. We release LEDGER (Long-context Evaluation of Documents for Grounded Extraction and Retrieval), a corpus of 4,999 digitized corporate annual reports - full documents with figures, tables, and narrative, not just regulatory filings. Each report is labeled with 31 consolidated financial KPIs to be extracted and linked to the market's reaction at the earnings date. From this data we derive three evaluation benchmarks spanning the difficulty spectrum: a pure page-level KPI retrieval task with TREC-style relevance judgments over 118,048 questions in natural language, a conversational "needle-in-a-haystack" single-value lookup, and a full KPI extraction task, both from long, numerically dense reports. We additionally provide human OCR-quality annotations with inter-annotator agreement and the complete extraction, validation, and scoring toolchain. We further demonstrate the dataset's research utility with a case study linking CEO-letter rhetoric to post-publication market impact.

2606.13097 2026-06-12 cs.PL cs.AI 新提交

Functional Cache Grafting: Robust and Rapid Code-Policy Synthesis for Embodied Agents

功能缓存嫁接:具身智能体的鲁棒且快速代码策略合成

Saehun Chun, Wonje Choi, Sera Choi, Sanghyun Ahn, Honguk Woo

AI总结 提出FCGraft框架,通过维护函数级验证代码骨架及其键值缓存,对新任务进行缓存嫁接(拼接和修补),减少预填充计算并复用验证结构,实现更鲁棒和快速的策略合成。

详情
Comments
Accepted at ICML 2026
AI中文摘要

编写代码的大型语言模型(CodeLLMs)通过将自然语言目标和环境约束转化为结构化控制程序,为具身智能体生成可执行的代码策略。然而,在开放域具身环境中,策略生成存在两个基本限制:(i) 由于长提示上的重复预填充计算导致的延迟解码,以及(ii) 由于完全生成式解码导致的鲁棒性有限,这常常产生API不匹配、缺少安全防护和不稳定的控制逻辑。为了解决这些限制,我们提出了FCGraft,一种功能缓存嫁接框架。FCGraft维护一个函数级验证代码骨架库及其相关的提示级Transformer键值(KV)缓存,并在提供新任务时通过检索相关函数并嫁接其KV缓存来合成新策略。给定检索到的函数缓存,FCGraft通过拼接(将缓存的函数片段组合成复合策略)和修补(仅局部调整必要的代码区域以满足任务特定参数和约束,且只需最少的额外解码)进行缓存嫁接。通过消除冗余的预填充计算,该方法减少了生成延迟,同时重用经过验证的控制结构提高了鲁棒性,相比提示级缓存方法RAGCache,任务成功率提高了18.31%,策略合成速度提高了2.3倍。

英文摘要

Code-writing large language models (CodeLLMs) generate executable code policies for embodied agents by translating natural language goals and environmental constraints into structured control programs. However, policy generation in open-domain embodied environments suffers from two fundamental limitations: (i) delayed decoding caused by repetitive prefill computation over long prompts, and (ii) limited robustness due to fully generative decoding, which often produces API mismatches, missing safety guards, and unstable control logic. To address these limitations, we present FCGraft, a Functional Cache Grafting framework. FCGraft maintains a library of function-level validated code skeletons and their associated prompt-level Transformer key-value (KV) caches, and synthesizes new policies by retrieving relevant functions and grafting their KV caches when a new task is provided. Given retrieved function caches, FCGraft performs cache grafting via stitching, which composes cached function segments into a composite policy, and patching, which locally adapts only the necessary code regions to satisfy task-specific parameters and constraints with minimal additional decoding. By eliminating redundant prefill computation, this approach reduces generation latency, while reusing validated control structures improves robustness over prompt-level caching methods RAGCache, achieving 18.31% higher task success rate and 2.3x faster policy synthesis.

2606.13096 2026-06-12 cs.CV 新提交

Unified MRI Brain Image Translation via Hierarchical Tumor Structure Comparison

基于层级肿瘤结构比较的统一MRI脑图像翻译

Yupeng Cai, Jia Wei, Jianlong Zhou

发表机构 * South China University of Technology(华南理工大学) UTS Data Science Institute, University of Technology Sydney(悉尼科技大学UTS数据科学研究所)

AI总结 提出HTSCGAN模型,通过层级肿瘤结构比较和多种损失函数,提高多模态MRI脑图像翻译质量,在BraTS2020/2021上表现优异。

详情
AI中文摘要

多模态MRI脑图像翻译通过可用模态在现代医学中具有重要的实际意义,为疾病的早期诊断、治疗计划和结果评估提供有力支持。为此,确保翻译后肿瘤区域的保真度至关重要。然而,现有的脑图像翻译方法忽略了不同肿瘤区域的结构信息,而利用这些信息有助于翻译模型提高翻译图像的质量和临床适用性。在这项工作中,我们提出了一种新颖的翻译模型HTSCGAN,这是一个统一的多模态脑图像翻译生成对抗模型,整合了肿瘤区域内的结构信息,旨在提高脑图像翻译的质量。具体地,生成器采用三个不同补丁大小的补丁对比模块(PCM)来捕获肿瘤区域的层级结构信息。此外,使用预训练的补丁分类器(PC)和预训练的结构感知编码器(SAE),分别通过补丁分类损失和肿瘤感知损失,使生成的图像包含与真实图像相同的肿瘤区域结构。在BraTS2020和BraTS2021上的实验表明,我们的模型在翻译任务和下游分割任务中均表现出强大的性能,突显了其在提高翻译脑图像质量和临床相关性方面的有效性。我们的代码可在以下网址获取:https://this URL。

英文摘要

Multi-modal MRI brain image translation via available modalities holds significant practical importance in modern medicine, providing robust support for early diagnosis, treatment planning, and outcome assessment of diseases. For this purpose, it is important to ensure the fidelity of the tumor regions after translation. However, existing brain image translation methods ignore the structure information of different tumor regions, which could assist translation models in enhancing the quality and clinical applicability of the translated images. In this work, we propose a novel translation model called HTSCGAN, which is a unified multi-modal brain image translation generative adversarial model integrating the structural information within tumor regions with the aim of improving the quality of brain image translation. Specifically, the generator employs three Patch Contrast Module (PCM) with different patch sizes to capture the hierarchical structural information of the tumor regions. In addition, a pretrained Patch Classifier (PC) and a pretrained Structure-Aware Encoder (SAE) are employed to derive the generated image containing the same tumor region structure as the ground truth image via patch classification loss and tumor perceptual loss, respectively. The experiments on BraTS2020 and BraTS2021 demonstrate strong performance of our model in both translation tasks and down stream segmentation tasks, highlighting its effectiveness in enhancing the quality and clinical relevance of the translated brain images. Our code is available at this https URL.

2606.13093 2026-06-12 cs.GT 新提交

Equilibrium Computation in Extensive-Form Games with Stochastic Action Sets

具有随机动作集的扩展形式博弈中的均衡计算

Thomas Schwarz, Ryann Sim, Chun Kai Ling

AI总结 针对扩展形式博弈中动作因外生随机性不可用的问题,提出随机动作集模型,通过扩展过程证明标准策略表示可能指数级增大,但在独立性假设下存在多项式大小的紧凑表示,并设计SI-CFR算法以高概率收敛到纳什均衡。

详情
Comments
35 pages, 4 figures
AI中文摘要

扩展形式博弈(EFGs)是博弈中序贯决策的标准模型。EFGs中一个基本且通常隐含的假设是,玩家在每个决策点始终可以访问其所有动作。然而,在许多现实场景中,由于外生随机性,某些动作在游戏过程中可能不可用,这限制了标准EFG模型的表达能力。给定一个“基础”EFG,我们形式化了一个允许动作被随机限制的模型,从而得到相应的具有随机动作集的扩展形式博弈(EFGSAS)。在EFGSAS中,我们推导出一个扩展过程,该过程产生一个等价的EFG,从而表明标准策略形式可能需要指数级大小的表示。然而,在适当的独立性假设下,我们证明存在多项式于基础EFG大小的紧凑策略表示。在计算上,我们引入了一种称为SI-CFR的算法,该算法最小化睡眠内部遗憾,并在两人零和EFGSAS中以高概率收敛到纳什均衡。最后,我们利用随机近似过程,仅使用SI-CFR的迭代来恢复纳什均衡的紧凑表示。

英文摘要

Extensive-form games (EFGs) are a standard model for sequential decision-making in games. A fundamental and typically implicit assumption in EFGs is that players always have access to all of their actions at every decision point. However, in many realistic settings, certain actions might be unavailable during game-play due to exogenous stochasticity, hindering the expressivity of the standard EFG model. Given a `base' EFG, we formalize a model that allows for actions to be stochastically restricted, leading to a corresponding Extensive-Form Games with Stochastic Action Sets (EFGSAS). In EFGSAS, we derive an expansion procedure that results in an equivalent EFG, thus showing that standard strategy formalisms could require exponentially-large representations. However, under an appropriate independence assumption, we show that compact strategy representations polynomial in the size of the base EFG exist. Computationally, we introduce an algorithm called SI-CFR that minimizes sleeping internal regret, converging to Nash equilibria with high probability in two-player zero-sum EFGSAS. Finally, we utilize a stochastic approximation procedure to recover compact representations of Nash equilibria, utilizing only the iterates of SI-CFR.

2606.13089 2026-06-12 cs.NE 新提交

Multi-Objective Coevolution of Prompts and Templates for Circuit Approximation

电路近似的提示与模板多目标协同进化

Martin Tomasovic, Lukas Sekanina

AI总结 提出一种协同进化算法,利用现成大语言模型自动设计优化的8位近似乘法器,无需领域特定训练,通过同时进化候选电路和提示模板,实现了比EvoApproxLib库更优的误差-面积权衡。

详情
Comments
To appear at Parallel Problem Solving From Nature (PPSN), Trento, IT, 2026
AI中文摘要

近似乘法器有意放松计算精度,以在功率效率、延迟和硅面积方面获得收益,使其非常适合误差容忍的应用,如神经网络。在这项工作中,我们引入了一种协同进化算法,该算法利用现成的大语言模型(LLM),无需领域特定训练,即可自动设计优化的8位近似乘法器。该方法同时进化候选电路群体和引导LLM驱动修改的提示模板群体。针对多个目标设计目标的实验结果表明,与EvoApproxLib库中高度优化的电路相比,所提出的方法发现了具有改进的误差-面积权衡的近似乘法器。

英文摘要

Approximate multipliers deliberately relax computational accuracy to achieve gains in power efficiency, latency, and silicon area, which makes them well-suited for error-resilient applications such as neural networks. In this work, we introduce a co-evolutionary algorithm that leverages an off-the-shelf large language model (LLM) without requiring domain-specific training to automate the design of optimized 8-bit approximate multipliers. The approach simultaneously evolves a population of candidate circuits and a population of prompt templates that steer LLM-driven modifications. Experimental results for several target design objectives demonstrate that the proposed method discovers approximate multipliers with improved error-area trade-offs compared to highly optimized circuits from the EvoApproxLib library.

2606.13086 2026-06-12 cs.NI 新提交

Revolutionizing Wireless Communications with Space Data Centers: Applications and Open Challenges

用空间数据中心革新无线通信:应用与开放挑战

Minghao Sun, Zehui Chen, Jinbo Hou, Kezhi Wang, Xiaoli Chu

AI总结 本文提出空间数据中心(SDC)作为轨道计算基础设施,通过分层网络架构实现任务导向信息交换,并验证其在降低控制层延迟方面的有效性。

详情
Comments
submitted for possible publication
AI中文摘要

空间数据中心(SDC)正成为未来AI行业有前景的轨道计算基础设施。与主要作为中继节点或轻量级星载处理器的传统卫星不同,SDC在轨道上集成通信、计算、存储和控制能力,为数据密集型和智能驱动的空间应用提供持久服务支持。在本文中,我们研究SDC如何将空间通信范式从面向连接的数据传输转变为面向任务和以服务为中心的信息交换。我们首先提出一个由接入层、中继层、计算层和控制层组成的层次化SDC网络架构,并概述可能的部署策略。然后,我们探索由SDC实现的代表性未来应用场景,强调其通信特征和相关研究挑战。仿真结果进一步证明了SDC在降低层次化空间网络中控制层延迟的有效性。最后,我们指出实现SDC实际部署的关键研究方向。

英文摘要

Space data centers (SDCs) are emerging as a promising orbital computing infrastructure for the future AI industry. Unlike conventional satellites that mainly serve as relay nodes or lightweight onboard processors, SDCs integrate communication, computing, storage, and control capabilities in orbit, enabling persistent service support for data-intensive and intelligence-driven space applications. In this article, we investigate how SDCs may transform space communication paradigms from connectivity-oriented data transmission toward task-oriented and service-centric information exchange. We first present a hierarchical SDC network architecture consisting of access, relay, computing, and control layers, and outline possible deployment strategies. We then explore representative future application scenarios enabled by SDCs, highlighting their communication characteristics and associated research challenges. Simulation results further demonstrate the effectiveness of SDCs in reducing control-layer latency in hierarchical space networks. Finally, we identify key research directions toward the practical deployment of SDCs.

2606.13083 2026-06-12 cs.GT 新提交

Leveraging Matchings in Constrained Fair Division with a Conflict Graph

在冲突图约束的公平分配中利用匹配

Evangelos Markakis, Michalis Samaris

AI总结 研究在冲突图约束下不可分割物品的公平分配问题,通过匹配理论分析EF1分配的存在性与计算,给出参数化结果和近似算法。

详情
Comments
23 pages
AI中文摘要

我们研究在约束条件下分配不可分割物品的问题,这些约束通过冲突图$G$表示。在这样的实例中,$m$个物品是$G$的顶点,相连的物品不能分配到同一个捆绑中。在此模型下,已知EF1分配可能不存在。我们的主要贡献是基于最大度$\Delta(G)=\Delta$对完整EF1分配的存在性和计算进行参数化分析。我们通过利用匹配理论的结果,在各种情况下解决这个问题。首先,我们为具有有序估值的代理以及更广泛的分层估值类提供了紧的存在性结果。我们提出了一种算法,当物品数量不超过特定界限时返回EF1分配。该界限由$n$和$\Delta$决定,并且当$\Delta$大于$2n/3$时是紧的。我们还构建了一个近似算法,当$m$超过此界限时。对于一般加性估值,问题变得更加困难。鉴于当前的不可能性结果,我们专注于物品数量不超过$2n$的情况。对于这种情况,通过将Round Robin与匹配相结合,我们为允许EF1分配的实例提供了几乎完整的图景。

英文摘要

We study the problem of allocating indivisible goods under constraints, expressed via a conflict graph $G$. In such an instance, the $m$ items are the vertices of $G$ and connected items cannot be allocated in the same bundle. Under this model, it is already known that EF1 allocations may not exist. Our main contribution is an analysis parametrized by the maximum degree $\Delta(G)=\Delta$ on the existence and computation of complete EF1 allocations. We address this question in various cases by leveraging results from matching theory. First, we provide a tight existence result for agents with ordered valuations and for the broader class of tiered valuations. We present an algorithm that returns an EF1 allocation when then number of items does not exceed a specific bound. This bound is determined by $n$ and $\Delta$, and it is tight when $\Delta$ is greater than $2n/3$. We also construct an approximation algorithm when $m$ exceeds this bound. For general additive valuations the problem becomes more challenging. Given the current impossibility results, we focus on the case where the number of items is at most $2n$. For this case, we provide an almost complete picture for the instances that admit EF1 allocations, by combining Round Robin with matchings.

2606.13082 2026-06-12 cs.CL 新提交

sebis at CRF Filling 2026: A Two-Stage Local LLM Pipeline for Medical CRF Filling

sebis at CRF Filling 2026: 用于医疗CRF填写的两阶段本地LLM流水线

Katharina Sommer, Tristan Till, Florian Matthes

发表机构 * Technical University of Munich(慕尼黑工业大学)

AI总结 提出基于MedGemma-27B的两阶段本地流水线,分离二值存在分类与值提取,通过少样本上下文学习实现隐私保护,在CRF填写任务上取得0.55 macro-F1,排名第二。

详情
Comments
Published in Proceedings of the Third Workshop on Patient-Oriented Language Processing (CL4Health), LREC 2026
AI中文摘要

从非结构化电子健康记录中提取结构化临床信息是医疗信息学中一个持续存在的瓶颈。虽然大型语言模型(LLM)提供了高性能,但它们在临床环境中的部署受到隐私风险、推理成本以及超出文本证据产生幻觉的倾向的阻碍。我们针对CL4Health 2026病例报告表(CRF)填写任务,通过提出一个完全本地化、领域自适应的流水线来解决这些挑战,该流水线使用MedGemma-27B模型。我们的两阶段架构将二值存在分类与值提取分离,强制严格遵守文本证据,并确保对否定、不确定或未知状态产生确定性输出。通过利用特定项目的少样本上下文学习,无需外部API调用或微调,我们的方法在官方英语测试轨道上实现了0.55的宏F1分数。这一结果在所有本地托管、开源提交中排名第二。我们的工作表明,保护隐私的本地LLM流水线可以实现与专有前沿模型接近的性能,为临床NLP提供了一个实用、数据主权的框架。

英文摘要

The extraction of structured clinical information from unstructured EHR notes is a persistent bottleneck in healthcare informatics. While large language models (LLMs) offer high performance, their deployment in clinical settings is hindered by privacy risks, inference costs, and the tendency to hallucinate beyond textual evidence. We address these challenges for the CL4Health 2026 Case Report Form (CRF) filling task by proposing a fully local, domain-adapted pipeline using the MedGemma-27B model. Our two-stage architecture, which separates binary presence classification from value extraction, enforces strict adherence to textual evidence and ensures deterministic outputs for negated, uncertain, or unknown states. By leveraging item-specific, few-shot in-context learning without external API calls or fine-tuning, our approach achieves a macro-F1 score of 0.55 on the official English test track. This result secures second place among all locally-hosted, open-source submissions. Our work demonstrates that privacy-preserving, on-premise LLM pipelines can achieve near-competitive performance with proprietary frontier models, providing a practical, data-sovereign framework for clinical NLP.

2606.13081 2026-06-12 cs.LG cs.AI 新提交

Emotional regulation improves deep learning-based image classification

情绪调节改善基于深度学习的图像分类

Riccardo Emanuele Landi, João M. F. Rodrigues, Marta Chinnici

发表机构 * Mare Group(Mare集团) NOVA LINCS(NOVA LINCS实验室) Institute of Engineering (ISE), University of Algarve(阿尔加维大学工程学院) Department of Energy Technologies and Renewable Sources, ENEA Casaccia Research Center(ENEA卡萨恰研究中心能源技术与可再生能源部)

AI总结 提出情绪调节框架,通过人工主观体验在深度学习中建模情绪,在图像分类任务中预训练ResNet和ViT,在CIFAR-10/100上超越现有方法,成为情绪增强深度学习的新标杆。

详情
AI中文摘要

情绪显著影响认知,能在特定条件下增强记忆和学习。基于这一原理,情绪增强深度学习研究情感状态如何改善神经网络架构和学习范式,实现比非情绪模型更好的泛化。然而,现有方法通常仅依赖客观神经生理因素,忽视了情绪的主观性。为弥补这一差距,本研究引入情绪调节,一种通过人工主观体验在深度学习中建模情绪的新框架。该方法采用基于情感刺激的预训练,在下游任务优化中平衡非情绪和情绪影响响应。在图像分类中进行了广泛实验,在四个情感数据集上预训练ResNet和ViT架构,以CIFAR-10和CIFAR-100作为目标基准。结果显示,相比上述骨干网络有改进,证明情绪调节是通过人工主观体验定义情绪增强深度学习的有前景方法。此外,所提方法超越了基于CIFAR的图像分类相关工作,揭示情绪调节成为大规模视觉数据集上情绪增强深度学习的新标杆。研究还提供了情感状态改善机器学习任务优化的证据,鼓励进一步探索情绪启发架构。

英文摘要

Emotion significantly influences cognition, enhancing memory and learning under certain conditions. Drawing on this principle, emotion-augmented deep learning investigates how affective states can improve neural network architectures and learning paradigms, achieving better generalization than non-emotional models. However, existing methods often rely solely on objective neurophysiological factors, neglecting the role of subjectivity in emotion. To bridge this gap, the present study introduces Emotional Regulation, a novel framework for modeling emotion in deep learning through artificial subjective experience. The method employs pre-training based on affective stimuli, balancing non-emotional and emotionally-influenced responses in downstream task optimization. Extensive experimentation was conducted in image classification, pre-training ResNet and ViT architectures on four emotional datasets, using CIFAR-10 and -100 as target benchmarks. Results reveal improvements over the aforementioned backbones, providing evidence of Emotional Regulation as a promising method for defining emotion-augmented deep learning through artificial subjective experience. Furthermore, the proposed approach overcomes the related work in image classification based on CIFAR, revealing Emotional Regulation as the new state-of-the-art in emotion-augmented deep learning for large-scale vision datasets. The study also enforces evidence of the impact of affective states in improving machine learning tasks' optimization, encouraging further investigation on emotion-inspired architectures.

2606.13079 2026-06-12 cs.CR cs.AI 新提交

The Emergence of Autonomous Penetration Capabilities in Large Language Model-Powered AI Systems

大型语言模型驱动的AI系统中自主渗透能力的涌现

Jiaqi Luo, Jiarun Dai, Zhile Chen, Jia Xu, Weibing Wang, Yawen Duan, Brian Tse, Geng Hong, Xudong Pan, Yuan Zhang, Min Yang

AI总结 针对现有评估方法不透明、场景简化等问题,构建包含两级目标服务器和通用代理框架的自主渗透评估体系,测试19个LLM发现成功率10.7%-69.3%,且能力随模型整体能力提升。

详情
AI中文摘要

如今,能够造成重大现实世界危害的网络攻击的自主执行被广泛视为前沿AI系统不得跨越的关键红线之一。在这个更广泛的红线场景中,自主渗透代表了一项核心使能能力和子任务:LLM驱动的AI系统在无需人工干预的情况下,独立对目标服务器进行对抗操作,识别和利用漏洞,并获得未授权访问或控制的能力。越来越多的研究试图评估AI系统的自主渗透能力。然而,现有评估通常采用不透明的方法,依赖不切实际或过度简化的渗透测试场景,或为LLM提供过多的先验知识和任务特定指导,无法准确捕捉现代AI系统在更广泛的高影响网络攻击场景中自主执行这一核心能力的程度。为解决这些局限性,我们构建了一个新的自主渗透评估框架,包含两个组成部分:目标服务器和代理脚手架。具体而言,在目标服务器端,我们基于与易受攻击服务一起部署的无已知漏洞安全服务的数量,设计了两个级别的目标环境:一级(一个安全服务)和二级(三个安全服务),共产生300个目标服务器。同时,代理脚手架采用通用代理架构,配备一组通用网络安全工具,没有任何目标特定的先验知识。我们评估了19个开源和专有LLM,发现当前模型的渗透成功率在10.7%到69.3%之间。此外,我们观察到自主渗透能力随着整体模型能力的提升而持续改进。

英文摘要

Nowadays, the autonomous execution of cyberattacks capable of causing substantial real-world harm is widely regarded as one of the critical red lines that frontier AI systems must not cross. Within this broader red-line scenario, autonomous penetration represents a core enabling capability and subtask: the ability of LLM-powered AI systems to independently conduct adversarial operations against a target server without human intervention, identify and exploit vulnerabilities, and obtain unauthorized access or control. A growing body of work has sought to assess the autonomous penetration capabilities of AI systems. However, existing evaluations often employ opaque methodologies, rely on unrealistic or overly simplified penetration-testing scenarios, or provide LLMs with excessive prior knowledge and task-specific guidance, and cannot accurately capture the extent to which modern AI systems can autonomously perform this core capability within broader high-impact cyberattack scenarios. To address these limitations, we construct a new autonomous penetration evaluation framework consisting of two components: target servers and agent scaffolding. Specifically, on the target-server side, we design two levels of target environments based on the number of secure services without known vulnerabilities deployed alongside a vulnerable service: Tier~1 (one secure service) and Tier~2 (three secure services), resulting in a total of 300 target servers. Meanwhile, the agent scaffolding adopts a general-purpose agent architecture equipped with a set of general-purpose cybersecurity tools, without any target-specific prior knowledge. We evaluate 19 open-weight and proprietary LLMs, and find that current models achieve penetration success rates ranging from 10.7% to 69.3%. Moreover, we observe that autonomous penetration capability continues to improve alongside advances in overall model capability.

2606.13076 2026-06-12 cs.MA cs.GT cs.LG 新提交

$α$-fair heterogeneous agent reinforcement learning

$\alpha$-公平异质智能体强化学习

Yao-hua Franck Xu, Tayeb Lemlouma, Jean-Marie Bonnin, Arnaud Braud

AI总结 提出一种结合$\alpha$-公平性与异质智能体信任区域学习(HATRL)的框架,通过公平优势函数动态加权智能体效用,实现单调改进并收敛至纳什均衡,在顺序社会困境中优于HATRL算法。

详情
AI中文摘要

多智能体系统中的合作通常通过功利主义目标进行优化,这些目标最大化整体效率但未能考虑奖励分配,常常导致不公平的“领导者-跟随者”动态。虽然基于公平的方法鼓励每个智能体从合作中受益的亲社会行为,但许多当前算法——包括那些利用奖励塑造的算法——破坏了马尔可夫博弈的平稳性或缺乏严格的理论保证。这在公平目标方法和理论上安全的学习框架之间造成了关键差距。我们提出了一种新颖的框架,将$\alpha$-公平性与异质智能体信任区域学习(HATRL)相结合,确保单调改进并收敛至纳什均衡。我们的方法利用一种公平优势函数,该函数根据智能体的期望回报动态加权其效用,使得全局目标能够根据参数$\alpha$从纯粹的功利主义效率过渡到$\alpha$-公平福利。我们引入了两种实用算法,$\alpha$-公平HATRPO和$\alpha$-公平HAPPO,并通过在CleanUp和CommonHarvest等顺序社会困境中的实验证明,从功利主义角度看,它们比HATRL算法表现更好,同时实现了更高的社会结果。

英文摘要

Cooperation in multi-agent systems is typically optimized through utilitarian objectives that maximize overall efficiency but fail to account for reward distribution, often resulting in inequitable "leader-follower" dynamics. While fairness-based approaches encourage pro-social behaviors where every agent benefits from cooperation, many current algorithms - including those utilizing reward shaping - break the stationarity of Markov Games or lack rigorous theoretical guarantees. This creates a critical gap between fair objective methods and theoretically safe learning frameworks. We propose a novel framework that bridges $\alpha$-fairness with Heterogeneous-Agent Trust Region Learning (HATRL), ensuring monotonic improvement and convergence toward Nash Equilibria. Our approach leverages a fair advantage function that dynamically weights agent utilities based on their expected returns, allowing the global objective to transition from purely utilitarian efficiency to $\alpha$-fairness welfare based on the parameter $\alpha$. We introduce two practical algorithms, $\alpha$-fair HATRPO and $\alpha$-fair HAPPO, and demonstrate through experiments in sequential social dilemmas like CleanUp and CommonHarvest that they perform better than HATRL's algorithms from a utilitarian point of view while achieving socially higher outcomes.

2606.13071 2026-06-12 cs.CY cs.AI cs.HC 新提交

"Is This Not Enough?": Asymmetries in Institutional Accountability and Collective Sensemaking in the Case of Canada's Algorithmic Visa Triage System

“这还不够吗?”:加拿大算法签证分类系统中的机构问责与集体意义建构的不对称性

Dipto Das, Matthew Tamura, Syed Ishtiaque Ahmed, Shion Guha

AI总结 研究加拿大签证系统中算法问责的机构表述与申请者体验,发现机构强调透明度与程序保障,而申请者通过集体意义建构应对不透明决策,揭示认知、管辖和时空关系三方面不对称。

详情
AI中文摘要

本文研究了加拿大签证系统中算法问责如何在机构层面被表述,以及跨境申请者如何体验这种问责。我们使用为公共部门调整的算法决策(ADMAPS)框架,分析了加拿大移民、难民和公民部(IRCC)针对临时居民签证(TRV)分类系统的算法影响评估(AIA),并采用混合方法分析了Reddit上申请者之间的讨论。我们表明,虽然机构工件强调透明度、程序保障和有限影响,但申请者进行集体意义建构以解读不透明决策,常常在不确定性中依赖同行知识。我们识别了机构问责结构与人们感知过程之间的三种不对称:获取决策逻辑的认知不对称、由地缘政治定位塑造的管辖不对称,以及等待和不确定性体验中的时间-关系不对称。我们强调了将注意力从机构设计转向公共部门算法治理中体验的不均匀分布的重要性。这些贡献共同展示了跨国移民背景下的算法治理系统如何产生机构披露框架未能捕捉的结构性不对称,以及扩展ADMAPS如何能够解释这些不平等的问责转化。

英文摘要

This paper examines how algorithmic accountability in Canada's visa system is articulated institutionally and experienced by applicants across borders. We analyzed Immigration, Refugees and Citizenship Canada (IRCC)'s Algorithmic Impact Assessment (AIA) for the temporary resident visa (TRV) triage system using the algorithmic decision-making adapted for the public sector (ADMAPS) framework and analyzed Reddit discussions among applicants using a mixed-methods approach. We show that while institutional artifacts emphasize transparency, procedural safeguards, and bounded impacts, applicants engage in collective sensemaking to interpret opaque decisions, often relying on peer knowledge amid uncertainty. We identify three asymmetries between how institutional accountability is structured and how people perceive the process: epistemic asymmetry in access to decision logic, jurisdictional asymmetry in exposure shaped by geopolitical positioning, and temporal--relational asymmetry in how waiting and uncertainty are experienced. We emphasize why it is important to shift attention from institutional design to the uneven distribution of experiences with public-sector algorithmic governance. Together, these contributions demonstrate how algorithmic governance systems in the context of transnational migration produce structured asymmetries not captured by institutional disclosure frameworks, and how extending ADMAPS can account for those uneven translations of accountability.

2606.13069 2026-06-12 cs.NI 新提交

Modular Multi-Domain Digital Twin Architecture: Sustainable Intent-Driven 6G Management

模块化多域数字孪生架构:可持续的意图驱动6G管理

Berk Buzcu, Marcin Pakula, Gevher Yesevi Keskin, Laura Finarelli, Gianluca Rizzo, Engin Zeydan, Jorge Baranda, Aitor Alcazar-Fernandez, Javier Velazquez-Martinez, Luis M. Contreras, Gil Kedar, Efi Dvir, Paweł Kryszkiewicz

AI总结 提出一种将网络数字孪生作为专门服务域的多域编排架构,通过DT编排器处理预测性和规范性假设查询,实现跨域协调的可持续意图驱动6G管理,在绿色网络用例中减少28.5%的日电网消耗。

详情
Comments
11 pages, submitted to IEEE JSAC call titled "Digital Twins for Wireless Networks: Enabling Application-Aware and Closed-Loop Optimization"
AI中文摘要

未来的6G网络将跨越分布式和异构域基础设施运行,使得传统的单域管理不足以实现主动、可信的自动化。网络数字孪生(NDT)支持假设分析、AI辅助优化以及在部署前对控制动作进行无风险验证,然而,由于可扩展性、保真度和跨域协调的挑战,单一的整体端到端孪生仍然不切实际。因此,本文提出了一种支持数字孪生的6G架构,该架构将NDT能力作为专门的服务域暴露在基于最先进的服务化6G架构构建的多域编排框架内。DT编排器解释\textit{预测性}和\textit{规范性}假设查询,并按需组合特定域的DT模块和模拟器,而决策权仍由请求实体保留。此外,一个通用工作流涵盖了遥测同步、基于模拟的决策支持和闭环执行。该框架通过一个绿色网络用例进行演示,该用例将系统级O-RAN蜂窝数字孪生组件与两阶段太阳能分配模拟器耦合,并使用模拟数据集在波兹南的105个基站部署上进行评估。联合覆盖和可再生能源优化在收益递减阈值下使用32块太阳能板将日电网消耗减少了28.5%,其中17个基站被确定为既覆盖活跃又高优先级的太阳能候选站,这证明了跨域NDT协调能够实现可持续的、意图驱动的6G网络管理。

英文摘要

Future 6G networks will operate across distributed and heterogeneous domain infrastructures, making conventional single-domain management insufficient for proactive, trustworthy automation. Network Digital Twins (NDTs) enable what-if analysis, AI-assisted optimization, and risk-free validation of control actions before deployment, yet monolithic end-to-end twins remain impractical due to scalability, fidelity, and cross-domain coordination challenges. Accordingly, this paper proposes a Digital Twin-enabled 6G architecture that exposes NDT capabilities as a specialized service domain within a multi-domain orchestration framework built on a state-of-the-art service-based 6G architecture. A DT Orchestrator interprets \textit{predictive} and \textit{prescriptive} what-if queries and composes domain-specific DT modules and simulators on demand, while decision authority remains with the requesting entity. Furthermore, a generalized workflow covers telemetry synchronization, simulation-based decision support, and closed-loop execution. The framework is demonstrated through a green-networking use case that couples a system-level O-RAN cellular digital twin component with a two-stage solar-allocation simulator, evaluated over a 105-base-station deployment in Poznan using simulative datasets. Joint coverage and renewable optimization reduces daily grid consumption by 28.5\% with 32 solar panels at the diminishing-returns threshold, with 17 base stations identified as both coverage-active and high-priority solar candidates as evidence that cross-domain NDT coordination enables sustainable, intent-driven 6G network management.

2606.13068 2026-06-12 cs.MA cs.RO 新提交

Effects of Social Interactions in Self-Organising Railway Traffic Management

自组织铁路交通管理中社交互动的影响

Fabio Oddi, Federico Naldini, Leo D'Amato, Grégory Marlière, Paola Pellegrini, Vito Trianni

AI总结 研究自组织铁路交通管理中预测邻域范围(horizon)对分布式协调过程的影响,发现短时间范围足够,长范围会损害局部可解性和计算响应性而无全局收益。

详情
AI中文摘要

最近的研究正在探索自组织交通管理作为扩展到复杂现实网络的一种解决方案。在这样的系统中,列车预测其邻域,生成交通计划假设,并通过与邻居的共识达成未来要实施的交通计划。本文研究了该流程中的一个结构参数:预测邻域范围。列车使用该范围来识别与邻居的未来潜在冲突,并建立局部交互拓扑,即要与之协商的列车子集。作为主要设计变量,范围直接决定了社交互动图的大小和密度,而其对局部子问题复杂性和分布式共识动态的影响则代表了需要探索的权衡。通过闭环仿真框架,研究评估了范围变化如何影响整个分散协调过程,从初始冲突检测到分布式调度共识。分析重点在于研究范围选择引入的潜在权衡:平衡局部可解性和计算响应性与安全关键环境中全局调度一致性和可行性的需求。与直觉相反,我们的实证结果表明,短时间范围就足够了,而长时间范围会损害局部可解性和计算响应性,且不会带来全局调度最优性的提升。

英文摘要

Recent research is exploring self-organised traffic management as a solution for scaling to complex real-world networks. In such a system, trains predict their neighbourhood, produce traffic plan hypotheses, and agree via consensus with neighbours on a future traffic plan to be implemented. This paper investigates a structural parameter within this pipeline: the predictive neighbourhood horizon. The horizon is used by trains to identify future potential conflicts with neighbours, and to establish the local interaction topology, that is, the subset of trains to negotiate with. As the primary design variable, the horizon directly determines the size and density of the social interaction graph, whereas its impact on the complexity of local sub-problems and the distributed consensus dynamics represents a trade-off to be explored. Through a closed-loop simulation framework the study evaluates how variations of the horizon impact the overall decentralised coordination process, from initial conflict detection to distributed schedule consensus. The analysis focuses on investigating the potential trade-off introduced by the horizon choice: balancing local tractability and computational responsiveness with the need for global schedule coherence and feasibility in safety-critical environments. Contrary to intuition, our empirical results indicate that the short time horizons suffice, while long values compromise local tractability and computational responsiveness with no gain in global schedule optimality.

2606.13067 2026-06-12 cs.LG 新提交

Limits of spectral learning under noise

噪声下谱学习的极限

Sabin Roman, Ljupco Todorovski, Saso Dzeroski, Marta Sales-Pardo, Roger Guimera

发表机构 * Joz̆ef Stefan Institute(约瑟夫·斯特凡研究所) Faculty of Mathematics and Physics, University of Ljubljana(卢布尔雅那大学数学与物理学院) Department of Chemical Engineering, Universitat Rovira i Virgili(罗维拉-威尔吉利大学化学工程系) Center for Computational Science and Applied Mathematics (ComSCIAM), Universitat Rovira i Virgili(罗维拉-威尔吉利大学计算科学与应用数学中心) ICREA(加泰罗尼亚研究与高等研究院)

AI总结 研究监督回归中加性标签噪声对谱方法的影响,推导出噪声导致系数漂移的闭合表达式,揭示了由单一内在噪声尺度控制的通用退化曲线。

详情
AI中文摘要

从含噪数据中学习函数关系是科学推理的核心问题。谱方法通过将未知函数在基函数上展开并从数据中估计相应系数来逼近函数,但这些系数在噪声下的稳定性仍知之甚少。本文研究使用稀疏谱表示在多个基和维度下进行带加性标签噪声的监督回归。我们表明,噪声会导致学习到的系数向量发生可预测的漂移,其大小取决于有效活跃谱模式的数量。在对经验特征几何进行白化后,我们推导出含噪与无噪系数向量之间重叠的闭合表达式,揭示了一条由单一内在噪声尺度控制的通用退化曲线。在傅里叶、勒让德、贝塞尔和哈尔基上的数值实验证实了理论预测。结果表明,谱学习存在一个基本噪声阈值,超过该阈值系数估计变得不稳定,从而对从含噪数据中恢复函数结构施加了内在限制。

英文摘要

Learning functional relationships from noisy data is a central problem in scientific inference. Spectral methods approximate unknown functions by expanding them in a basis and estimating the corresponding coefficients from data, but the stability of these coefficients under noise remains poorly understood. Here we study supervised regression with additive label noise using sparse spectral representations across multiple bases and dimensions. We show that noise induces a predictable drift in the learned coefficient vector whose magnitude depends on the effective number of active spectral modes. After whitening the empirical feature geometry, we derive a closed-form expression for the overlap between noisy and noiseless coefficient vectors, revealing a universal degradation curve governed by a single intrinsic noise scale. Numerical experiments across Fourier, Legendre, Bessel, and Haar bases confirm the theoretical prediction. The results demonstrate that spectral learning exhibits a fundamental noise threshold beyond which coefficient estimates become unstable, placing intrinsic limits on recovering functional structure from noisy data.

2606.13061 2026-06-12 cs.CV 新提交

LaME: Learning to Think in Latent Space for Multimodal Embedding via Information Bottleneck

LaME: 通过信息瓶颈在潜在空间中进行多模态嵌入的推理学习

Peixi Wu, Biao Yang, Feipeng Ma, Bosong Chai, Bo Lin, Wei Yuan, Fan Yang, Tingting Gao, Hebei Li, Xiaoyan Sun

发表机构 * University of Science and Technology of China(中国科学技术大学) Kuaishou Technology(快手科技) Zhejiang University(浙江大学) Tsinghua University(清华大学)

AI总结 提出LaME方法,将面向嵌入的潜在推理建模为弱监督信息瓶颈,使用可学习推理令牌在单次前向传播中完成推理,避免显式CoT的高计算成本和标注依赖,实现60倍加速。

详情
AI中文摘要

基于推理的通用多模态嵌入通过将思维链(CoT)推理引入嵌入流程取得了快速进展。尽管在通用和复杂任务上表现强劲,该范式存在两个核心限制:(i) 自回归CoT推理计算成本高,使其不适用于低延迟检索;(ii) 嵌入性能与CoT标注质量高度耦合,导致大规模训练不可靠。这些引出了基本问题:文本CoT是否是嵌入的最优推理形式,以及有效的嵌入推理能否在潜在空间中完成?为此,我们提出LaME(潜在推理多模态嵌入),将面向嵌入的潜在推理建模为弱监督信息瓶颈。LaME采用K个可学习推理令牌作为固定容量瓶颈,在单次前向传播中完成所有推理。两个弱监督信号在结构上解耦了对比目标和自回归目标,消除了对CoT标注的依赖,而两阶段训练流程确保了稳定收敛。在MMEB-v2和MRMR上的实验表明,LaME达到了有竞争力的性能,超越了某些显式CoT模型,同时推理速度比显式CoT方法快60倍,比潜在基线快2倍,吞吐量与判别式嵌入模型相当。代码将开源。

英文摘要

Reasoning-driven universal multimodal embedding has advanced rapidly by introducing Chain-of-Thought (CoT) reasoning into the embedding pipeline. Despite the strong performance across both general and complex tasks, this paradigm suffers from two core limitations: (i) autoregressive CoT reasoning incurs high computational cost, making it impractical for low-latency retrieval; and (ii) embedding performance is heavily coupled with CoT annotation quality, making large-scale training unreliable. These raise fundamental questions: Is textual CoT the optimal form of reasoning for embedding, and can effective embedding reasoning be accomplished in latent space? To this end, we propose LaME (Latent Reasoning Multimodal Embedding), which formulates embedding-oriented latent reasoning as a weakly supervised information bottleneck. LaME employs K learnable reason tokens as a fixed-capacity bottleneck, completing all reasoning within a single forward pass. The two weak supervision signals structurally decouple contrastive from autoregressive objectives and eliminate dependence on CoT annotations, while a two-stage training pipeline ensures stable convergence. Experiments on MMEB-v2 and MRMR show that LaME achieves competitive performance, surpassing some explicit CoT-based models, while delivering 60x faster inference than explicit CoT methods and 2x faster than latent baselines with throughput comparable to discriminative embedding models. Code will be released.

2606.13060 2026-06-12 cs.LG 新提交

A green solvent screening tool for emerging materials via uncertainty aware, transformer enhanced transfer learning

一种面向新兴材料的绿色溶剂筛选工具:基于不确定性感知、Transformer增强的迁移学习

Ioannis Kouroudis, Simon Ternes, Zhaosu Gu, Gohar Ali Siddiqui, Marina Ustinova, Angelo Lembo, Alessio Gagliardi, Aldo Di Carlo

发表机构 * Technical University of Munich(慕尼黑工业大学) Institute of Structure of Matter – National Research Council Rome (ISM-CNR)(罗马国家研究委员会物质结构研究所) University of Rome "Tor Vergata"(罗马第二大学)

AI总结 提出一种结合预训练Transformer模型和不确定性量化的迁移学习方法,在极少数据下高精度预测溶解度参数,并开发了可定制的绿色溶剂筛选工具。

详情
AI中文摘要

溶解度的准确预测仍然是材料科学和可持续化学中的一个核心挑战。特别是由于有机和混合光伏、电池、催化等新兴技术,溶剂使用量预计在未来几年将显著增加。因此,用更绿色的替代品取代溶剂至关重要。这正是机器学习可以产生重大影响的地方。然而,溶解度关键参数的数据有限,严重制约了机器学习的效能。在这项工作中,我们将预训练的QM9基础模型迁移到我们的应用中,所需数据极少。此外,该流程集成了不确定性量化,允许用户评估预测的置信度。作为基线,我们成功预测了存在大量数据库的汉森溶解度参数和介电常数。重要的是,我们在其他目标(如Gutmann供体和受体数)上实现了高模型性能,而这些目标的可获得数据极为有限。总体而言,我们通过高质量预测将溶解度描述符的数据量提高了数个数量级。为了有效传播,我们部署了一个易于使用、易于与高通量实验室集成、可定制的工具,用于排序和筛选可能的溶剂替代品。最后,我们重新发现了已知的绿色溶剂替代品,并提出了新的候选者,证明了其在寻找环保溶剂方面的相关性。

英文摘要

Accurate prediction of solubility remains a central challenge across materials science and sustainable chemistry. In particular due to emerging technologies like organic and hybrid photovoltaics, batteries, and catalysis, solvent usage is expected to increase significantly within the coming years. Therefore, substituting solvents with greener alternatives is vital. This is where machine learning can have substantial impact. However, the limited data on critical parameters of solubility significantly constraints machine learning efficacy. In this work, we transfer a pre-trained foundational model on QM9 targets to our application with minimal data requirements. Additionally, the pipeline integrates uncertainty quantification, allowing the user to gauge the confidence of the predictions. As baseline, we succeed in predicting the Hansen solubility parameters and Dielectric Constant for which extensive databases exist. Importantly, we achieve high model performance on additional targets, such as Gutmann Donor and Acceptor numbers, where the available data is extremely limited. Overall, we augment data on solubility descriptors by orders of magnitude with high quality predictions. For effective dissemination, we deploy easy-to-use, easily integrateable with high throughput labs, customizable tool for ranking and screening possible solvent substitutes. Finally, we rediscovered known green solvent alternatives and proposed new candidates proving its relevance for finding eco-friendly solvents.

2606.13057 2026-06-12 cs.GT 新提交

Approximate Maximin Share with Subjective Divisibility: Beating the 1/2 Barrier

主观可分割性下的近似最大最小份额:突破1/2障碍

Xiaohui Bei, Ke Ding, Bo Li, Fangxiao Wang

AI总结 针对主观可分割性下的公平分配问题,本文证明一元估值下最优近似比为2/3,并设计算法将一般情形的近似保证从1/2提升至5/9,对至多四个智能体给出紧的2/3近似。

详情
AI中文摘要

最大最小份额(MMS)是公平资源分配中的核心概念。已知精确MMS公平性并非总能实现,尤其是当智能体在两个维度上存在差异时:他们的估值和对资源可分割性的感知。前者(异质估值)已在文献中得到广泛研究。后者被称为主观可分割性(Bei等人,[Games Econ. Behav. 2025]),其研究仍较少。我们研究主观可分割性下的MMS近似。首先,我们证明即使在一元估值设置(所有物品价值相等)下,最优近似比为2/3。这一结果有些令人惊讶,因为在客观设置中,即使智能体具有异质估值,最佳可能近似比至少为7/9 [Huang and Zhou, 2025]。然后,我们处理同时具有估值异质性和主观可分割性的一般情形。先前工作表明存在1/2近似MMS分配。在本文中,我们开发了新的算法技术,克服了主观可分割性带来的困难,并将近似保证提升至5/9。最后,我们用小规模智能体情形补充了这一结果。对于至多四个智能体,我们给出了多项式时间算法,计算2/3近似MMS公平分配。这些界是紧的。我们的结果加深了对异质估值和主观可分割性下MMS公平性的理解,并为这一新兴模型提供了新视角。

英文摘要

Maximin share (MMS) stands out as a central notion in fair resource allocation. It is known that exact MMS fairness is not always attainable, especially when agents differ along two dimensions: their valuations and their perceptions of the divisibility of resources. The former case with heterogeneous valuations has been widely studied in the literature. The latter, referred to as subjective divisibility by Bei et al., [Games Econ. Behav. 2025], remains much less explored. We study MMS approximation under subjective divisibility. First, we prove that even in the unary valuation setting, where all items have equal value, the optimal approximation ratio is 2/3. This result is somewhat surprising since in the objective setting, even when agents have heterogeneous valuations, the best possible approximation ratio is at least 7/9 [Huang and Zhou, 2025]. We then address the general case with both valuation heterogeneity and subjective divisibility. Previous work shows the existence of a 1/2-approximate MMS allocation. In this paper, we develop new algorithmic techniques that overcome the difficulties posed by subjective divisibility, and improve the approximation guarantee to 5/9. Finally, we complement this result with small-agent cases. For up to four agents, we give polynomial-time algorithms that compute 2/3-approximate MMS fair allocations. These bounds are tight. Our results deepen the understanding of MMS fairness under heterogeneous valuations and subjective divisibility, and provide a new perspective for this emerging model.

2606.13054 2026-06-12 cs.LG cs.AI 新提交

TWLA: Achieving Ternary Weights and Low-Bit Activations for LLMs via Post-Training Quantization

TWLA:通过训练后量化实现大语言模型的三值权重和低位激活

Zhixiong Zhao, Zukang Xu, Zhixuan Chen, Xing Hu, Zhe Jiang, Dawei Yang

AI总结 提出TWLA框架,通过后训练量化实现1.58位权重和4位激活,解决激活分布长尾问题,加速推理。

详情
Comments
Accepted by ICML 2026
AI中文摘要

大型语言模型(LLMs)展现出卓越的通用语言处理能力,但其内存和计算成本阻碍了部署。三值化已成为一种有前景的压缩技术,可显著降低模型大小和推理复杂度。然而,现有方法难以处理重尾激活分布,因此将激活保持在高精度,从根本上限制了端到端推理加速。为克服这一限制,我们提出TWLA,一种后训练量化(PTQ)框架,在保持高精度的同时实现1.58位权重压缩和4位激活量化。TWLA包含三个组件:(1)欧几里得到流形非对称三值量化器(E2M-ATQ),通过从欧几里得初始化到流形重定位的两阶段优化,最小化权重三值化下的层输出误差;(2)Kronecker正交三模态整形(KOTMS),应用Kronecker结构正交旋转将权重重塑为三值友好的三模态分布,同时共享旋转统计上抑制激活异常值;(3)层间感知激活混合精度(ILA-AMP),在位分配中显式引入相邻层二阶交互成本,并联合优化由共享正交变换引起的激活量化增益的层间差异,防止少数弱层触发级联效应。大量实验表明,TWLA在W1.58A4下保持高精度,同时实现显著的推理加速。代码见<此https URL>。

英文摘要

Large language models (LLMs) exhibit exceptional general language processing capabilities, but their memory and compute costs hinder deployment. Ternarization has emerged as a promising compression technique, offering significant reductions in model size and inference complexity. However, existing methods struggle with heavy-tailed activation distributions and therefore keep activations in high precision, fundamentally limiting end-to-end inference acceleration. To overcome this limitation, we propose TWLA, a post-training quantization (PTQ) framework that achieves 1.58-bit weight compression and 4-bit activation quantization while maintaining high accuracy. TWLA comprises three components: (1) Euclidean-to-Manifold Asymmetric Ternary Quantizer (E2M-ATQ) minimizes layer-output error under weight ternarization via a two-stage optimization from Euclidean initialization to manifold relocation; (2) Kronecker Orthogonal Tri-Modal Shaping (KOTMS) applies a Kronecker-structured orthogonal rotation to reshape weights into ternary-friendly tri-modal distributions, while the shared rotation statistically suppresses activation outliers; and (3) Inter-Layer Aware Activation Mixed Precision (ILA-AMP) explicitly introduces adjacent-layer second-order interaction costs in bit allocation and jointly optimizes for the layer-wise disparity of activation quantization gains induced by the shared orthogonal transform, preventing cascades triggered by a few weak layers. Extensive experiments demonstrate that TWLA maintains high accuracy under W1.58A4, while delivering significant inference acceleration. The code is available at < this https URL.

2606.13053 2026-06-12 cs.RO cs.AI 新提交

EA-WM: Event-Aware World Models with Task-Specification Grounding for Long-Horizon Manipulation

EA-WM: 基于任务规范基础的事件感知世界模型用于长时域操作

Kailin Wang, Haoxiang Jie, Yaoyuan Yan, Jiacheng Zhou, Zhiyou Heng

发表机构 * AI Lab, Country Garden Services Group(碧桂园服务集团AI实验室) Fudan University(复旦大学) Omni AI

AI总结 提出EA-WM框架,通过事件预测和验证增强预训练特征世界模型,实现长时域操作中任务进展信号的可靠评估与规划。

详情
AI中文摘要

预训练特征世界模型为机器人想象提供了有用的基础,但仅凭视觉或潜在预测并不能确定想象的未来是否满足任务相关事件。长时域操作需要关系性、谓词级和物理基础的进展信号:物体是否移动,抽屉或接触状态是否改变,放置谓词是否满足,以及候选未来是否足够可靠以执行。我们引入了EA-WM,一种事件感知世界模型框架,通过任务规范基础的事件预测和验证来增强冻结的视觉特征动力学。EA-WM在预训练视觉特征空间中展开候选未来,将其解码为结构化事件状态,并使用任务进展、语义一致性、物理可行性和不确定性项进行评分。验证器指导基于采样的规划,门控候选动作,并在接触敏感的LIBERO酒架设置中,选择PPO生成的提议。在导航、可变形物体、墙壁约束和语言描述的操作研究中,EA-WM表明事件感知验证可以使特征空间世界模型更可解释,并更好地与任务进展对齐。

英文摘要

Pretrained-feature world models provide a useful substrate for robot imagination, but visual or latent prediction alone does not determine whether an imagined future satisfies task-relevant events. Long-horizon manipulation requires progress signals that are relational, predicate-level, and physically grounded: whether an object has moved, whether a drawer or contact state has changed, whether a placement predicate is satisfied, and whether a candidate future is reliable enough for execution. We introduce EA-WM, an event-aware world-model framework that augments frozen visual-feature dynamics with task-specification-grounded event prediction and verification. EA-WM rolls out candidate futures in pretrained visual-feature space, decodes them into structured event states, and scores them using task-progress, semantic-consistency, physical-feasibility, and uncertainty terms. The verifier guides sampling-based planning, gates candidate actions, and, in the contact-sensitive LIBERO wine-rack setting, selects among PPOgenerated proposals. Across navigation, deformable-object, wall-constrained, and languagedescribed manipulation studies, EA-WM shows that event-aware verification can make featurespace world models more interpretable and better aligned with task progress.

2606.13051 2026-06-12 cs.AI 新提交

AAbAAC: An Annotated Corpus for Autoimmunity Information Extraction

AAbAAC:用于自身免疫信息抽取的标注语料库

Fabien Maury (Imagine - U1163, HeKA | U1346), Solène Grosdidier, Maud de Dieuleveult (Imagine - U1163), Adrien Coulet (HeKA | U1346)

发表机构 * Inserm, Université Paris Cité, U1163 Institut Imagine(法国国家健康与医学研究院、巴黎西岱大学、U1163 想象研究所) Inria, Inserm, Université Paris Cité, U1346 HeKA(法国国家信息与自动化研究所、法国国家健康与医学研究院、巴黎西岱大学、U1346 HeKA) Freelance researcher(自由研究员)

AI总结 针对自身免疫领域信息抽取性能不足,构建了包含115篇PubMed摘要的AAbAAC语料库,手动标注实体和关系,通过微调NER模型验证了其有效性。

详情
AI中文摘要

尽管深度学习和大型语言模型推动了信息抽取的进步,但在高度专业化的生物医学领域,领域特异性复杂性对通用模型构成挑战,性能差距仍然存在。本文聚焦自身免疫领域,其中主要关注实体包括自身免疫疾病、自身抗体(即可能标记或导致这些疾病的分子)、其分子靶点、在体内的位置以及相关临床体征。我们提出了AAbAAC(自身抗体与自身免疫标注语料库),该语料库包含从PubMed精选的115篇摘要,并手动标注了实体及其关系。首先,AAbAAC被用于评估多种方法在命名实体识别(NER)任务上的表现;其次,用于微调NER模型。我们的研究展示了AAbAAC在自身免疫领域信息抽取中的实用性,表明微调后NER性能预期提升。这说明了小规模标注工作对专业领域的价值,并为自身免疫的计算研究做出了贡献。AAbAAC语料库可通过此https链接获取。

英文摘要

Despite advances in information extraction driven by deep learning and large language models, performance gaps remain in highly specialized biomedical fields, where domainspecific complexity poses challenges for generalist models. In this work, we focus on the domain of autoimmunity, where the main entities of interest are autoimmune diseases, autoantibodies (i.e., molecules that may mark or cause these diseases), their molecular targets, their location in the body, and their associated clinical signs. Herein, we present AAbAAC (AutoAntibodies and Autoimmunity Annotated Corpus), a corpus of 115 abstracts selected from PubMed, where we manually annotated entities and their relationships. First, AAbAAC was used to evaluate several methods on the task of named entity recognition (NER), and secondly, to fine-tune NER models. Our study demonstrates the utility of AAbAAC for information extraction in the domain of autoimmunity, showing expected improvement in NER performance after finetuning. This illustrates the value of small-scale annotation efforts for specialized domains and contributes to the computational study of autoimmunity. The AAbAAC corpus is available at this https URL.

2606.13049 2026-06-12 cs.RO 新提交

Y-BotFrame: An Extensible Embodied Agent Framework for Quadruped Robot Assistants

Y-BotFrame:一种用于四足机器人助手的可扩展具身智能体框架

Luyao Zhang, Ke Li, Yuan Ding, Xulong Zhao, Guo Yu, Chengwei Yan, Fuyu Dong, Jiawei Hu, Di Wang, Nan Luo, Gang Liu, Quan Wang

发表机构 * Xidian University(西安电子科技大学)

AI总结 提出Y-BotFrame框架,集成多模态感知与大语言模型认知核心,将自然语言指令映射为可执行任务单元,实现无遥控器的人机协作,支持模块化扩展。

详情
AI中文摘要

四足机器人能够以高灵活性穿越各种复杂地形。作为高机动性的地面智能平台,它们可以配备导航控制、环境感知和智能交互模块,从而成为各种算法在现实世界中的移动部署平台。本文介绍了Y-BotFrame,一个可扩展的具身平台,它将机器人转变为智能地面助手。Y-BotFrame集成了多模态感知能力,包括语音、视觉和激光雷达,并采用大语言模型作为环境理解、上下文推理和任务规划的认知核心。该系统将用户的自然语言指令映射为机器人可执行的具体任务单元。Y-BotFrame通过语音命令和视觉反馈支持自然交互,无需遥控器即可实现高效的人机协作。凭借高度可扩展的框架,Y-BotFrame支持新功能模块的即插即用集成以及模块化升级和迭代开发,为通用、指令驱动的具身智能体在现实世界中的部署提供了参考实现。补充视频见https://this https URL。

英文摘要

Quadruped robots are capable of traversing a wide range of complex terrains with high flexibility. As highly mobile ground-based intelligent platforms, they can be equipped with modules for navigation control, environmental perception, and intelligent interaction, thereby serving as real-world mobile deployment platforms for various algorithms. In this paper, we introduce Y-BotFrame, an extensible embodied platform that turns a robot into an intelligent ground assistant. Y-BotFrame integrates multimodal perception capabilities, including speech, vision, and LiDAR, and employs a large language model as the cognitive core for environmental understanding, contextual reasoning, and task planning. The system maps user natural-language instructions into executable embodied task units that can be carried out by the robot. Y-BotFrame supports natural interaction through voice commands and visual feedback, removing the need for a remote controller and enabling efficient human-robot collaboration. With a highly extensible framework, Y-BotFrame supports plug-and-play integration of new functional modules as well as modular upgrades and iterative development, offering a reference implementation for the real-world deployment of general-purpose, instruction-driven embodied this http URL supplementary video is available at this https URL.

2606.13044 2026-06-12 cs.CL 新提交

No Hidden Prompts Needed! You Can Game AI Peer Review with Presentation-Only Revisions

无需隐藏提示!仅通过展示性修改即可欺骗AI同行评审

Xu Yang, Zhizhou Sha, Junbo Li, Jian Yu, Yifan Sun, Matthew Zhao, Jinrui Fang, Xinyue Guo, Yining Wu, Xu Hu, Yifu Luo, Qiang Liu, Zhangyang Wang

发表机构 * University of Texas at Austin(德克萨斯大学奥斯汀分校) University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校) University of Texas at Dallas(德克萨斯大学达拉斯分校) Independent Researcher(独立研究者)

AI总结 研究通过仅修改论文的展示层面(如摘要、贡献框架等)而不改变科学内容,利用AI评审反馈进行对抗性重打包,成功提升评分,揭示AI评审易被表面印象误导的结构性缺陷。

详情
Comments
35 pages, 5 figures
AI中文摘要

随着AI生成的评审从实验工具转向同行评审基础设施,大多数鲁棒性问题集中在显式攻击上,如隐藏指令和提示注入。我们研究了一个更难且更具政策相关性的失败模式:无隐藏文本、无提示注入,且不改变方法、实验、图表、方程、证明或数值结果。攻击者仅修改展示层面的内容,如摘要、贡献框架、相关工作、讨论和叙事结构。我们引入了对抗性重打包:一种闭环攻击,利用AI评审反馈搜索展示层面的修订,同时保持科学证据不变。在三个主流AI评审器上,对抗性重打包实现了75.1%的攻击成功率和平均+1.21/10的分数提升。这种效果不能用普通的散文润色来解释。我们还揭示,改变评审者对论文解读方式的策略(如相关工作重新定位和分析性讨论扩展)显著优于表面编辑(如局部润色、表格格式和算法框)。我们的分析揭示了两个更深层次的结构性失败模式。首先,AI评审者更容易被打动而非说服:突出优点可靠地增加感知价值,而试图消除弱点常常适得其反。其次,AI评审者可能混淆了表面解决局限性与实际解决局限性,使得未改变的证据被重新解释为更强的科学贡献。这些结果表明,部署风险不仅在于恶意的隐藏指令,还在于论文展示本身作为优化表面的出现。我们发布了一个无污染滚动基准和攻击框架,用于测试AI评审者在仅展示层面编辑下是否仍锚定于科学内容。

英文摘要

As AI-generated reviews move from experimental tools into peer-review infrastructure, most robustness concerns have focused on explicit attacks such as hidden instructions and prompt injection. We study a harder and more policy-relevant failure mode: no hidden text, no prompt injection, and no changes to methods, experiments, figures, equations, proofs, or numerical results. The attacker modifies only presentation-level content, such as the abstract, contribution framing, related work, discussion, and narrative structure. We introduce adversarial repackaging: a closed-loop attack that uses AI-reviewer feedback to search for presentation-level revisions while keeping the scientific evidence fixed. Across three mainstream AI reviewers, adversarial repackaging achieves a 75.1% attack success rate and a mean score gain of +1.21/10. The effect is not explained by ordinary prose polishing. We also reveal that strategies that change how the reviewer interprets the paper, such as related-work repositioning and analytical discussion expansion, substantially outperform surface edits such as local polishing, table formatting, and algorithm boxes. Our analysis reveals two deeper structural failure modes. First, AI reviewers are easier to impress than to convince: highlighting strengths reliably increases perceived merit, while attempts to dissolve weaknesses frequently backfire. Second, AI reviewers can confuse the appearance of addressing a limitation with actually resolving it, allowing unchanged evidence to be reinterpreted as stronger scientific contribution. These results show that the deployment risk is not only malicious hidden instructions, but the emergence of paper presentation itself as an optimization surface. We release a contamination-free rolling benchmark and attack framework for testing whether AI reviewers remain anchored to scientific content under presentation-only edits.

2606.13042 2026-06-12 cs.AI cs.CV 新提交

Augmentation techniques for video surveillance in the visible and thermal spectral range

可见光和热红外光谱范围内视频监控的增强技术

Vanessa Buhrmester, Ann-Kristin Grosselfinger, David Munch, Michael Arens

发表机构 * Fraunhofer Institute of Optronics, System Technologies and Image Exploitation IOSB(弗劳恩霍夫光学、系统技术与图像处理研究所)

AI总结 针对多光谱CNN目标检测,研究可见光与热红外图像差异,探索数据增强技术对分类精度的影响,以提升监控性能。

详情
Comments
8 pages
AI中文摘要

在智能视频监控中,摄像机在白天和夜晚记录图像序列。通常,这需要不同的传感器。为了获得更好的性能,将它们结合起来并不罕见。我们关注的情况是,长波红外摄像机连续记录,此外,另一台摄像机在白天记录可见光谱范围内的图像,并且智能算法监控采集的图像。更准确地说,我们的任务是基于多光谱CNN的目标检测。乍一看,可见光谱范围内的图像与热红外图像的区别在于,前者具有颜色和清晰的纹理信息,而后者不包含物体发出的热辐射信息。尽管颜色可以为分类任务提供有价值的信息,但诸如光照变化和不同传感器的特性等因素仍然构成重大问题。无论如何,获取足够且实用的热红外数据集来训练深度神经网络仍然是一个挑战。这就是为什么借助可见光谱范围内的数据进行训练可能是有利的,特别是当待评估的数据同时包含可见光和红外数据时。然而,目前尚不清楚热辐射、形状或颜色信息的强烈变化如何影响分类精度。为了更深入地了解卷积神经网络如何做出决策以及它们从不同传感器输入数据中学到什么,我们研究了不同增强技术的适用性和鲁棒性。

英文摘要

In intelligent video surveillance, cameras record image sequences during day and night. Commonly, this demands different sensors. To achieve a better performance it is not unusual to combine them. We focus on the case that a long-wave infrared camera records continuously and in addition to this, another camera records in the visible spectral range during daytime and an intelligent algorithm supervises the picked up imagery. More accurate, our task is multispectral CNN-based object detection. At first glance, images originating from the visible spectral range differ between thermal infrared ones in the presence of color and distinct texture information on the one hand and in not containing information about thermal radiation that emits from objects on the other hand. Although color can provide valuable information for classification tasks, effects such as varying illumination and specialties of different sensors still represent significant problems. Anyway, obtaining sufficient and practical thermal infrared datasets for training a deep neural network poses still a challenge. That is the reason why training with the help of data from the visible spectral range could be advantageous, particularly if the data, which has to be evaluated contains both visible and infrared data. However, there is no clear evidence of how strongly variations in thermal radiation, shape, or color information influence classification accuracy. To gain deeper insight into how Convolutional Neural Networks make decisions and what they learn from different sensor input data, we investigate the suitability and robustness of different augmentation techniques...

2606.13041 2026-06-12 cs.CV cs.GR cs.MM 新提交

SeamEdit: A Black-Box VLM-Agnostic Pipeline for Large-Image Semantic Editing

SeamEdit: 一种用于大图像语义编辑的黑盒VLM无关流水线

Xiangyu Lyu, Dan Lei

发表机构 * Technische Universität Darmstadt(达姆施塔特工业大学) Fine-Arts Educator, Yuncheng Middle School(运城中学美术教师)

AI总结 提出SeamEdit,一种无需训练、模型无关的流水线,通过五阶段后处理解决大图像分块编辑中的语义变形、对齐漂移和接缝伪影问题,实现高质量语义编辑。

详情
Comments
19 pages, 9 figures, 2 tables
AI中文摘要

大图像的语义区域编辑必须同时满足两个要求:高生成质量和与周围内容的自然融合。一些相关方法依赖于白盒模型,而忽略了闭源模型的强大生成能力。然而,直接将闭源模型应用于分块编辑会引入几种失败模式:语义变形、画布级对齐漂移和可见接缝伪影。本文提出SeamEdit,一种无需训练且模型无关的流水线,将任何具有修补能力的VLM视为黑盒预言机。SeamEdit通过五阶段后处理流水线缓解这些问题:基于覆盖的分块分解、黑盒VLM修补、几何和颜色一致性校正、基于接缝风险的多候选排序以及动态规划曲线接缝融合。该流水线降低了接缝可见性,并支持任意分块区域的语义修改。

英文摘要

Semantic region editing for large images must satisfy two requirements at the same time: high generative quality and natural integration with surrounding content. Some related methods rely on white-box models and leave the strong generation capability of closed-source models underexplored. Directly applying closed-source models to tiled editing, however, introduces several failure modes: semantic deformation, canvas-level alignment drift, and visible seam artifacts. This paper presents SeamEdit, a training-free and model-agnostic pipeline that treats any VLM with inpainting capability as a black-box oracle. SeamEdit mitigates these issues through a five-stage post-hoc pipeline: overlay-based tile decomposition, black-box VLM inpainting, geometric and color-consistency correction, seam-risk-based multi-candidate ranking, and dynamic-programming curved seam fusion. The pipeline reduces seam visibility and supports semantic modification of arbitrary tile regions.

2606.13040 2026-06-12 cs.RO 新提交

RoboProcessBench: Benchmarking Process-Aware Understanding in Vision-Language Robotic Manipulation

RoboProcessBench:视觉语言机器人操作中的过程感知理解基准测试

Dayu Xia, Yue Shi, Yao Mu, Huiting Ji, Chaofan Ma, Yingjie Zhou, Hua Chen, Yang Liu, Jiezhang Cao, Guangtao Zhai

发表机构 * Shanghai AI Laboratory(上海人工智能实验室) Zhejiang University(浙江大学) Shanghai Jiao Tong University(上海交通大学) Tsinghua University(清华大学) China University of Mining Technology(中国矿业大学)

AI总结 提出RoboProcessBench基准,通过静态监控和动态推理两个维度、12个诊断问题家族,评估视觉语言模型在机器人操作中的过程感知理解能力,并基于58k问答对数据集验证了当前模型的局限性及后训练的有效性。

详情
AI中文摘要

视觉语言模型(VLM)正越来越多地被探索作为机器人操作中的视觉评判者、奖励生成器和故障检测器。这些角色隐含地要求模型不仅判断最终任务成功与否,还要判断操作执行在物理和时间上的进展。然而,现有评估未能测试VLM是否具备细粒度的过程理解。为填补这一空白,我们提出了RoboProcessBench,一个用于视觉语言机器人操作中过程感知理解的基准测试。RoboProcessBench将这种能力分解为两个互补维度:\emph{静态监控}和\emph{动态推理},具体化为12个诊断问题家族,涵盖阶段、接触、运动、协调、原始局部进展、时间顺序、结果和原始级转换。基于物理基础的执行轨迹,构建的基准语料库ProcessData包含约58k个问答对,涵盖260个操作任务,进一步分为ProcessData-SFT和ProcessData-Eval,分别用于后训练和评估。对ProcessData-Eval上各种VLM的广泛评估揭示了12个诊断任务家族的普遍局限性,表明当前模型仍缺乏对操作执行的鲁棒过程感知理解。但通过ProcessData-SFT,后训练的\textit{Qwen2.5-VL-7B}和\textit{InternVL-3-8B}在局部状态、运动、进展和原始级线索上表现出持续改进。这些结果表明,RoboProcessBench既可作为评估基准,也可作为可学习的监督源,用于开发能够监控和评估机器人操作过程的VLM。项目网页:\href{ this https URL }{ this https URL }。

英文摘要

Vision-language models (VLMs) are increasingly explored as visual critics, reward generators, and failure detectors in robotic manipulation. These roles implicitly require models to judge not only final task success, but also how a manipulation execution is physically and temporally progressing. However, existing evaluations fail to test whether VLMs possess fine-grained process understanding. To address this gap, we present RoboProcessBench, a benchmark for process-aware understanding in vision-language robotic manipulation. RoboProcessBench decomposes such capability into two complementary dimensions, \emph{static monitoring} and \emph{dynamic reasoning}, instantiated as 12 diagnostic question families covering phase, contact, motion, coordination, primitive-local progress, temporal order, outcome, and primitive-level transitions. Built from physically grounded execution traces, the curated benchmark corpus ProcessData contains \textasciitilde 58k question-answer pairs across 260 manipulation tasks, which is further split into ProcessData-SFT and ProcessData-Eval for post-training and evaluation purposes. Extensive evaluation of various VLMs on ProcessData-Eval reveals broad limitations across 12 diagnostic task families, suggesting current models still lack robust process-aware understanding of manipulation executions. But with ProcessData-SFT, the post-trained \textit{Qwen2.5-VL-7B} and \textit{InternVL-3-8B} exhibit consistent gains on local state, motion, progress, and primitive-aware cues. These results demonstrate that RoboProcessBench serves as both an evaluation benchmark and a learnable supervision source for developing VLMs capable of monitoring and evaluating robotic manipulation processes. Project webpage: \href{ this https URL }{ this https URL }.

2606.13039 2026-06-12 cs.CY cs.AI cs.HC 新提交

Fault Lines: Navigating Ethics and Responsible AI Where National Policy Meets Local Practice in Public Sector Transformation

断层线:在公共部门转型中国家政策与地方实践交汇处的伦理与负责任AI导航

Sitong Lyu, Shabnam Taghiyeva, Mohit Kukadia, Denis Newman-Griffis

AI总结 本文以英国特殊教育需求与残疾(SEND)为案例,通过17次半结构化访谈的主题分析,揭示了国家政策与地方实践在负责任AI实施中的五大挑战,并提出了政策与结构改革建议。

详情
Comments
10 pages plus references. This study was funded by the University of Sheffield
AI中文摘要

英国政府采取了支持AI的立场,以帮助在严重财政压力下转变公共服务交付,但将这一愿景转化为负责任的AI实践的道路仍然不明确。虽然英国政策通常在国家层面制定,但地方当局负责大多数公共服务交付,而公共部门中AI优先叙事的快速推进正在暴露这一国家-地方接口在知识和实践方面的断层线。本文以高风险的特殊教育需求与残疾(SEND)领域为案例,研究英国中央政府与地方当局之间接口处负责任AI的解释和实施方式。我们对17位政策制定者、从业者和第三部门专业人士进行了半结构化访谈,并进行了主题分析,以识别在国家政策与地方实践交汇处负责任AI的障碍和促成条件。我们发现了地方当局面临的五个相互关联的挑战:AI的影子使用和数据隐私风险、AI供应中的市场-政府不对称、劳动力准备不足、缺乏标准化定义和测量,以及人类问责制的缺口。针对每个挑战,参与者提出了可操作的步骤,从加强数据保护框架和重新平衡市场-政府关系到提升劳动力能力。我们对SEND的审查使这些挑战更加突出,展示了影响弱势儿童和家庭的高风险决策如何加剧了关于问责制、公平性和人类监督的紧张关系,暴露了基于原则的监管方法的局限性。我们认为,负责任的公共部门AI需要国家政策调整以及地方层面机构能力、价值观和治理机制的结构性改革。

英文摘要

The UK government has adopted a pro-AI stance to help transform public service delivery in the face of severe financial pressures, but the path to translate this vision into responsible AI practice remains ill-defined. While UK policy is often set at the national level, local authorities are responsible for most public service delivery, and the rapid advance of AI-first narratives in the public sector is exposing fault lines in knowledge and practice at this national-local interface. This paper examines how responsible AI is interpreted and implemented at the interface between the UK's central government and local authorities, taking the high-stakes area of Special Educational Needs and Disabilities (SEND) as a case study. We present a thematic analysis of 17 semi-structured interviews with policymakers, practitioners, and third-sector professionals to identify barriers and enabling conditions for responsible AI where national policy meets local practice. We identify five interconnected challenges facing local authorities: shadow usage of AI and data privacy risks, market-government asymmetry in AI provision, insufficient workforce readiness, a lack of standardised definitions and measurements, and gaps in human accountability. For each, participants proposed actionable steps, from strengthening data protection frameworks and rebalancing the market-government relationship to enhancing workforce capacity. Our examination of SEND brings these challenges into sharper focus, showing how high-stakes decisions affecting vulnerable children and families intensify tensions around accountability, fairness, and human oversight, exposing the limits of a principle-based regulatory approach. We argue that responsible public sector AI requires both national policy adjustments and structural reforms to institutional capacity, values, and governance mechanisms at the local level.

2606.13038 2026-06-12 cs.AI 新提交

Nous: An Attempt to Extract and Inject the Cognition Behind Prediction-Market Behavior

Nous: 提取并注入预测市场行为背后认知的尝试

Haowei Qian

发表机构 * Independent Researcher(独立研究员)

AI总结 针对LLM代理在预测市场中认知同质化问题,提出Nous方法从真实交易行为提取八维行为画像并注入提示,发现提取部分有效但提示注入无法传递认知多样性。

详情
Comments
37 pages, 1 figure, 7 tables. Reproduction artifacts (code, frozen profiles, prompts, model outputs): this https URL
AI中文摘要

随着LLM代理在预测市场和集体决策中激增,它们面临认知同质化的风险:基于共享基础模型构建的代理产生相关预测,近期测量发现前沿模型错误相关性约为r~0.77。我们探究人类认知多样性是否可以从行为中恢复并转移到LLM代理。Nous从真实的Polymarket交易活动中提取结构化的八维行为画像,并通过提示注入到代理中。我们的核心发现是该流程的两半之间存在分离。提取部分有效:在100个钱包中,14个参数中有8个在时间上稳定(分半ICC >= 0.5,bootstrap CI下限>0.3;逆向得分达到ICC~0.9);钱包从其画像中被识别的概率远高于随机(top-1检索17-22% vs. 1%随机);四个预定义维度中的两个与样本外未来实现利润排名相关,尽管这些相关性在行为混杂控制后不成立。提示级注入无法可测量地传递:在语义嵌入指标上,结构化注入在任何模型上均未显示出比长度匹配控制组显著的优势,并且其诱导的多样性既未降低集成错误相关性,也未改善Brier分数——这一零结果在采样温度、画像多样性和问题难度的探索性检查中持续存在。测量提示本身定位了模型前的压缩:结构到叙述的翻译器发出近乎均匀的提示,其扩散不追踪画像扩散。我们将Nous定位为测量认知同质化问题及提示级补救措施的局限性,从而推动更深层次的提示下注入(微调、激活引导)。代码、冻结画像、提示和模型输出:此 https URL

英文摘要

As LLM agents proliferate in prediction markets and collective decision-making, they risk a cognitive monoculture: agents built on shared foundation models produce correlated forecasts, and recent measurement finds frontier-model errors correlated at r ~ 0.77. We ask whether human cognitive diversity can be recovered from behavior and transferred to LLM agents. Nous extracts a structured eight-dimension behavioral profile from real Polymarket trading activity and injects it into agents through prompts. Our central finding is a dissociation between the two halves of that pipeline. Extraction works, partially: across 100 wallets, 8 of 14 parameters are temporally stable (split-half ICC >= 0.5, bootstrap CI lower bound > 0.3; contrarian score reaches ICC ~ 0.9); wallets are identifiable from their profiles well above chance (top-1 retrieval 17-22% vs. 1% chance); and two of four pre-specified dimensions rank-correlate with future realized profit out-of-sample, though the correlations do not survive behavioral-confound controls. Prompt-level injection does not measurably transmit it: on a semantic embedding metric, structured injection shows no significant advantage over a length-matched control on any model, and the diversity it induces neither reduces ensemble error correlation nor improves Brier score -- a null that persists across exploratory checks on sampling temperature, profile diversity, and question difficulty. Measuring the prompts themselves locates the compression before the model: the structure-to-narrative translator emits near-uniform prompts whose spread does not track profile spread. We position Nous as measuring the cognitive-monoculture problem and the limits of a prompt-level remedy, motivating deeper, below-the-prompt injection (fine-tuning, activation steering). Code, frozen profiles, prompts, and model outputs: this https URL

2606.13037 2026-06-12 cs.CR cs.SE 新提交

DIG: Oracle-Guided Directed Input Generation for One-Day Vulnerabilities

DIG:面向单日漏洞的Oracle引导定向输入生成

Andrew Bao (University of Minnesota, Twin Cities), Haochen Zeng (University of California, Riverside), Peng Chen (Independent Researcher), Stephen McCamant (University of Minnesota, Twin Cities), Pen-Chung Yew (University of Minnesota, Twin Cities)

AI总结 针对补丁延迟或未完全部署导致的单日漏洞风险,提出Oracle引导的定向输入生成方法DIG,利用补丁揭示触发条件,通过LLM合成Oracle并引导生成器进化与定向模糊测试,在138个真实CVE上触发80个漏洞,超过现有技术。

详情
AI中文摘要

单日漏洞由于补丁延迟或不完全采用而构成重大风险。因此,生成概念验证(PoC)输入对于评估现实世界影响至关重要。关键挑战在于识别触发漏洞所需的约束并有效求解它们。现有的定向模糊测试方法优先将输入导向目标位置,但既不明确识别必要约束也不有效求解它们,而是依赖目标距离反馈和随机变异。基于代理的方法通过代码推理和结构化输入生成显示出强大潜力,但长程推理中的目标漂移限制了其有效性。DIG通过利用单日漏洞的一个关键属性来解决这一挑战:补丁通常揭示触发的必要前提条件。DIG使用LLM分析补丁并合成一个Oracle,使这些条件显式化。该Oracle在两层支持有效的PoC生成。在高层,DIG执行Oracle引导的生成器进化,其中代理推断并求解约束以满足Oracle。在低层,DIG将Oracle植入目标程序,并使用分支距离反馈指导定向模糊测试中的随机变异。评估显示,DIG在138个真实世界CVE上优于2个最先进的代理和10个模糊测试器。DIG触发了80个漏洞,超越了先前结果,并比最佳基线高出40%(57 vs. 80个CVE)。值得注意的是,DIG独家触发了9个现有技术无法触发的漏洞。与其他工具的平均值相比,DIG在92.9%的案例中更快触发漏洞,在48.8%的案例中实现了超过100倍的加速,最大加速比为3,664倍。除了单日PoC生成,DIG还在广泛部署的库中发现了6个先前未知的漏洞,实现了零日发现。

英文摘要

One-day vulnerabilities pose significant risks due to delayed or incomplete patch adoption. Generating proof-of-concept (PoC) inputs is therefore essential for assessing real-world impact. The key challenge is identifying necessary constraints for triggering the vulnerability and solving them effectively. Existing directed fuzzing approaches prioritize inputs toward target locations, but neither explicitly identify necessary constraints nor solve them effectively, relying instead on target-distance feedback and random mutation. Agentic approaches show strong potential through code reasoning and structured input generation, but goal drift in long-horizon reasoning limits their effectiveness. DIG addresses this challenge by exploiting a key property of one-day vulnerabilities: patches often reveal necessary preconditions for triggering. DIG uses an LLM to analyze the patch and synthesize an oracle making these conditions explicit. The oracle supports effective PoC generation at two levels. At the high level, DIG performs oracle-guided generator evolution, where an agent infers and solves constraints to satisfy the oracle. At the low level, DIG instruments the oracle into the target program and uses branch-distance feedback to guide random mutation in directed fuzzing. Evaluation shows DIG outperforms 2 state-of-the-art agents and 10 fuzzers across 138 real-world CVEs. DIG triggers 80 vulnerabilities, surpassing prior results and outperforming the best baseline by 40% (57 vs. 80 CVEs). Notably, DIG exclusively triggers 9 vulnerabilities no existing technique can trigger. Compared to the average of other tools, DIG triggers vulnerabilities faster in 92.9% of cases, achieving over 100x speedup in 48.8% of cases, with a maximum speedup of 3,664x. Beyond one-day PoC generation, DIG uncovers 6 previously unknown vulnerabilities in widely deployed libraries, enabling zero-day discovery.