arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2606.03511 2026-06-03 eess.SY cs.SY

Enhancing Collective Self-Consumption through Water Storage Heater Flexibility

通过储水加热器灵活性增强集体自消费

Pierre-Yves Massé, Maylis Duru, Benoit Couraud, Haicheng Ling, Solal Bizeul, Hugo Roussel, Cléa Verdot, Mariane Vittoz, Estefania Alvarez, Merlinda Andoni, Yann Rozier, Sonam Norbu, David Flynn, Erwin Franquet, Thibault Rihet

AI总结本文通过模拟和实际部署，量化了储水加热器灵活性对法国集体自消费社区在降低电费、提升自消费率方面的效益，并分析了用户接受度的影响因素。

详情

AI中文摘要

虽然可再生能源社区和集体自消费方案已成为加速可再生能源采用和支持净零转型的有前景工具，但只有通过需求侧灵活性将消费与可再生能源发电对齐，才能充分发挥其潜力。储水加热器可作为分布式热存储，吸收社区层面的多余可再生能源。本研究量化了储水加热器灵活性对法国集体自消费社区能源消费者的效益（如电费降低、自消费增加），以及与实施和用户接受相关的挑战。第一阶段，对一个由41户家庭和大型太阳能光伏电站组成的社区进行年度模拟分析，比较了无集体自消费社区场景、标准集体自消费社区场景以及具有储水加热器灵活性的集体自消费社区场景，结果显示，由于灵活性，每户家庭平均每年可获得70欧元的收益，太阳能光伏社区自消费和自生产分别提高6%和22%。第二阶段，我们展示了社区实际部署的结果，分析了其技术性能和用户接受度，并考察了影响用户参与度和满意度的主要因素。

英文摘要

While Renewable Energy Communities (RECs) and Collective Self-Consumption (CSC) schemes have emerged as promising tools to accelerate renewable energy adoption and support the net-zero transition, their full potential can only be realised when complemented by demand-side flexibility that aligns consumption with renewable generation. Water storage heaters can function as distributed thermal storage, absorbing excess renewable energy at the community level. This work quantifies both the benefits of water storage heaters flexibility for energy consumers in a CSC community in France (such as energy bill reduction, increase of self-consumption), and the challenges related to the implementation and user acceptance. At the first stage, an annual simulation analysis is performed on a community of 41 households and a large solar PV plant, comparing a scenario without a CSC community, a scenario with a standard CSC community, and a scenario with a CSC community with flexibility from water storage heaters, which showed that an average benefit of 70euro/year per household can be achieved due to flexibility and an increase of 6% and 22% of solar PV community self-consumption and self-production respectively. In the second stage, we present the results of the real-world deployment in the community, analysing its technical performance and user reception, and examine the main factors shaping user engagement and satisfaction.

URL PDF HTML ☆

赞 0 踩 0

2606.03502 2026-06-03 cs.DB

A Community Survey on SHACL and ShEx: Briding Gaps in RDF Validation

关于SHACL和ShEx的社区调查：弥合RDF验证中的差距

Maxime Jakubowski, Dominik Tomaszuk, Katja Hose

AI总结本文通过社区调查，分析了RDF验证的实践与挑战，识别了技术和方法上的改进领域，以指导未来研究和标准化工作。

Comments Presented at SEMANTiCS 2025

详情

DOI: 10.3233/SSW250011
Journal ref: SEMANTiCS 2025: 70-84

AI中文摘要

本文考察了RDF验证的实践和挑战，以理解利益相关者的应用、他们的需求，并识别技术和方法中需要改进的领域，从而指导未来的研究和标准化工作。我们进行了一项社区调查，针对学术界和工业界中多样化的RDF验证技术用户群体。调查收集了关于当前实践、工具使用、感知到的益处、局限性以及期望的增强功能的数据，以获得验证领域的广泛概览。我们的分析表明，尽管RDF验证被广泛采用并因其提高数据质量而受到重视，但重大挑战仍然存在。特别是，用户报告需要更好的文档、改进的工具支持、增强的性能以及更大的语言表达能力，以有效处理复杂的大规模验证任务。这项工作为RDF验证领域提供了关键见解，突显了当前实践和需要发展的关键领域。它为研究人员、开发人员和标准化机构提供了基础，以解决当前的局限性并推进验证技术，最终提高知识图谱中的数据质量和可用性。

英文摘要

This paper examines RDF validation practices and challenges to understand stakeholder applications, their needs, and identify areas for improvement in technologies and methodologies, thereby guiding future research and standardization efforts. A community survey was conducted, targeting a diverse group of RDF validation technology users across academia and industry. The survey collected data on current practices, tool usage, perceived benefits, limitations, and desired enhancements to gain a broad overview of the validation landscape. Our analysis shows that while RDF validation is widely adopted and valued for enhancing data quality, significant challenges remain. In particular, users report a need for better documentation, improved tool support, enhanced performance, and greater language expressiveness to handle complex large-scale validation tasks effectively. This work provides crucial insights into the RDF validation landscape, highlighting current practices and key areas for development. It offers a foundation for researchers, developers, and standardization bodies to address current limitations and advance validation technologies, ultimately improving data quality and usability in knowledge graphs.

URL PDF HTML ☆

赞 0 踩 0

2606.03492 2026-06-03 cs.HC

The Attention-Aware Pipeline: Design Tensions from Making Attention Visible in XR

注意力感知流水线：在XR中使注意力可见的设计张力

Arvind Srinivasan, Niklas Elmqvist

AI总结本文提出注意力感知流水线（捕获、记录、再可视化），通过三个系统（作为镜子、媒介、调解者）探索XR中使注视可见的设计张力，并通过眼动追踪研究验证干预需求。

详情

AI中文摘要

在共享活动中，人们的视线方向携带着言语和手势无法替代的协调线索，但这些模式对参与者来说仍然不可见。XR头显使注视可作为实时输入，但很少有系统将其视觉反馈出来。我们使用注意力感知流水线（捕获、记录、再可视化）来构建我们的工作，其反馈循环意味着系统的视觉响应会改变用户接下来关注的内容，从而触发进一步的响应。这产生了设计张力，其形式取决于每个阶段的配置。我们通过三个系统追踪该流水线：将注意力视为镜子（反映注视历史）、媒介（在协作者之间共享）和调解者（通过减弱现实进行干预）。每个系统都遇到了循环预测的张力，从而推动了下一个系统。一项针对四位音乐家的形成性眼动追踪研究揭示了注意力隧道效应和近乎完全的断开，证实了干预的必要性。我们呈现这些张力以及下一步：测试减法干预是否能减少单个视奏者的隧道效应。

英文摘要

Where people look during shared activity carries coordination cues that speech and gesture cannot replace, but these patterns remain invisible to participants. XR headsets make gaze available as real-time input, yet few systems feed it back visually. We frame our work using the Attention-Aware Pipeline (Capture, Record, Revisualize), whose feedback loop means the systems visual response alters what users attend to next, triggering further responses. This generates design tensions whose form depends on each stages configuration. We trace the pipeline through three systems casting attention as a mirror (reflecting gaze history), a medium (sharing it across collaborators), and a mediator (intervening through diminished reality). Each encountered a tension the loop predicted, motivating the next. A formative eye-tracking study of four musicians surfaced attentional tunneling and near-total disconnection, confirming the need for intervention. We present these tensions and a next step: testing whether subtractive intervention reduces tunneling for a single sight-reader.

URL PDF HTML ☆

赞 0 踩 0

2606.03485 2026-06-03 cs.HC

Analyzing Visual Attention Patterns During Band Rehearsal with Mobile Eye Tracking

使用移动眼动追踪分析乐队排练期间的视觉注意模式

Arvind Srinivasan, Tobias Rau, Michael Sedlmair

AI总结本研究通过移动眼动追踪技术，分析四名乐队成员在排练中的注视行为，发现以领队为中心的枢纽-辐射式注意力拓扑结构，并量化了学习过程中的注视稳定性变化。

详情

AI中文摘要

视觉注意对合奏协调至关重要，但音乐家在自然排练中如何分配目光仍知之甚少。我们进行了一项初步研究，使用移动眼动追踪检查一个四成员乐队在三首歌曲（每首练习两次）中的注视行为。音乐家佩戴了Pupil Labs Neon眼动追踪器，YOLOv8辅助的场景注释将注视映射到合奏成员和视野中的物体。通过分析注视矩阵、转移矩阵、时间围巾图和驻留-转移相关性，我们揭示了一种枢纽-辐射式注意力拓扑结构：排练领队是所有成员的主要注视目标，而学习中的吉他手将高达97%的人际驻留时间集中在这个单一参考点上。在尝试之间，对于不熟悉的材料，注视转移平均减少高达65%（个别参与者高达82%），扫描趋于稳定。围巾图显示教学中断如何分散注意力，而连续运行则巩固注意力。排练后的参与者反思与定量模式一致，我们讨论了这些发现对合奏教学中注视感知工具的意义。

英文摘要

Visual attention is central to ensemble coordination, yet how musicians allocate gaze during naturalistic rehearsal remains poorly understood. We present a pilot study using mobile eye tracking to examine gaze behaviour in a four-member band across three songs, each practiced twice. Musicians wore Pupil Labs Neon eye trackers, and YOLOv8-assisted scene annotations mapped fixations to ensemble members and objects in view. Analyzing fixation matrices, transition matrices, temporal scarf plots, and dwell-transition correlations, we uncover a hub-and-spoke attention topology: the session leader was the dominant gaze target for all members, while the learning guitarist concentrated up to 97% of interpersonal dwell on this single reference. Between attempts, gaze transitions decreased by up to 65% on average for unfamiliar material (up to 82% for individual participants) as scanning stabilized. Scarf plots reveal how teaching breakdowns fragment attention and uninterrupted runs consolidate it. Post-session participant reflections align with the quantitative patterns, and we discuss implications for gaze-aware tools in ensemble pedagogy.

URL PDF HTML ☆

赞 0 踩 0

2606.03484 2026-06-03 cs.LO math.LO

Optimizing Proof-Search via Linearization for Gödel-Löb Logic with Tree-Hypersequents

通过线性化优化 Gödel-Löb 逻辑的树-超矢列证明搜索

Tim S. Lyon, Omar Taher

AI总结针对 GL 逻辑在树-超矢列系统 CSGL 中的可判定性证明和 PSPACE 证明搜索算法问题，提出线性化方法，仅构建单分支推导和树矢列，实现 PSPACE 最优复杂度，并提取有限反模型验证算法正确性。

Comments in review

详情

AI中文摘要

我们回答了 Poggiolesi 提出的关于树-超矢列系统 CSGL 中 GL 的语法可判定性证明的问题，并解决了 Maggesi 和 Perini Brogi 识别的挑战，他们寻求在表达性矢列形式主义中为 GL 寻找 PSPACE 证明搜索算法。我们使用以（标记）树矢列形式表述的 CSGL 的符号变体。我们的答案是复杂度最优的：我们提出了一种证明搜索算法，该算法判定公式的（不）有效性并在 PSPACE 中运行，与已知的 GL 的 PSPACE 完全性相匹配。为实现这一点，我们引入了一种“线性化方法”，该方法每次只构建推导和树矢列的单个分支，避免了在矢列形式主义中朴素证明搜索典型的指数爆炸。我们展示了如何系统地组合证明搜索期间生成的树矢列片段以提取有限反模型，这作为理论工具用于在证明搜索失败时建立算法的正确性。最后，我们证明每个有效公式都有一个仅由线矢列（对应于线性嵌套矢列）组成的证明。这建立了深度优先证明搜索与线性嵌套矢列演算之间的联系。我们的结果不仅回答了上述问题，还为模态逻辑中树矢列系统的证明搜索和正确性论证提供了新见解。

英文摘要

We answer a question posed by Poggiolesi concerning a syntactic decidability proof for GL in the tree-hypersequent system CSGL, and resolve a challenge identified by Maggesi and Perini Brogi, who sought a PSPACE proof-search algorithm for GL in expressive sequent-based formalisms. We work with a notational variant of CSGL formulated in terms of (labeled) tree sequents. Our answer is complexity-optimal: we present a proof-search algorithm that decides the (in)validity of formulae and runs in PSPACE, matching the known PSPACE-completeness of GL. To achieve this, we introduce a "linearization method," which constructs only a single branch of a derivation and of a tree sequent at a time, avoiding the exponential blowup typical of naive proof-search in sequent formalisms. We show how to systematically combine fragments of tree sequents generated during proof-search to extract finite counter-models, which serves as a theoretical device for establishing the correctness of the algorithm when proof-search fails. Finally, we show that every valid formula admits a proof consisting solely of line sequents, which correspond to linear nested sequents. This establishes a connection between depth-first proof-search and linear nested sequent calculi. Our results not only answer the aforementioned questions, but also provide new insights into proof-search and correctness arguments in tree sequent systems for modal logics.

URL PDF HTML ☆

赞 0 踩 0

2606.03475 2026-06-03 eess.SY cs.SY

Surrogate Modeling of Interconnector Flows: A Machine Learning Alternative to Full-Scale Power System Simulations with Application to Cross-Border Electricity Exchange

互联线路流量的代理建模：机器学习替代全规模电力系统仿真及其在跨境电力交换中的应用

Robert Gaugl, Eloy Insunza, José Portela, Sonja Wogrin

AI总结提出机器学习代理框架，将节点时序数据映射为合成互联线路流量序列，并引入自定义损失函数惩罚不可行流量模式，在保持决策相关精度的同时实现约500倍计算加速。

详情

AI中文摘要

跨境电力交换对于运营和规划高比例可再生能源电力系统至关重要。许多研究通过降低空间粒度来保持模型可处理性，并通常重复使用历史进出口时间序列来外生地规定跨境交换。然而，随着可再生能源渗透率改变流量的幅度和时间，此类假设变得不一致。本文提出一个机器学习（ML）代理框架，将可用的节点时序数据（例如，每小时需求和可再生能源发电）映射到合成的互联线路级流量时间序列。目标是提供一致的流量剖面，这些剖面在简化的电力系统优化模型（PSOM）中用作固定边界条件。为了在优化中施加代理流量时改善下游可行性，我们进一步为神经网络代理引入自定义损失函数，惩罚物理上不可能的流量模式。我们在泛欧单节点每国直流最优潮流设置上演示该框架，使用开源LEGO PSOM和ENTSO-E TYNDP 2024国家趋势假设（2030年）。我们评估两类模型：k近邻（KNN）和前馈神经网络（SQU），使用完整和简化特征集。SQU模型比KNN更稳健地泛化到未见气候年份，并在预测精度上显著优于缩放的历史基准。当作为固定边界流量施加在单节点PSOM中时，ML生成的剖面产生与全欧洲仿真结果紧密匹配的结果，同时实现大幅运行时间缩减（高达约500倍）。这些结果表明，基于ML的流量代理可以为高可再生能源系统中的可处理简化研究提供决策相关的互联线路流量。

英文摘要

Cross-border electricity exchanges are crucial for operating and planning highly renewable power systems. Many studies reduce spatial granularity to keep the models tractable and prescribe cross-border exchanges exogenously, often by reusing historical import/export time series. Such assumptions become inconsistent as renewable penetration changes the magnitude and timing of flows. This paper proposes a machine-learning (ML) surrogate framework that maps available nodal time series data (e.g., hourly demand and renewable generation) to synthetic, interconnector-level flow time series. The goal is to provide consistent flow profiles that are used as fixed boundary conditions in reduced power system optimization models (PSOMs). To improve downstream feasibility when surrogate flows are imposed in optimization, we further introduce a custom loss for the neural-network surrogate that penalizes physically impossible flow patterns. We demonstrate the framework on a pan-European single-node per country DC optimal power flow setting using the open-source LEGO PSOM with ENTSO-E TYNDP 2024 National Trends assumptions for 2030. We assess two model classes: k-nearest neighbors (KNN) and feedforward neural networks (SQU), using both full and reduced feature sets. The SQU models generalize more robustly than KNN to unseen climate years and substantially improve upon scaled historical benchmarks in terms of predictive accuracy. When imposed as fixed boundary flows in single-node PSOMs, the ML-generated profiles produce outcomes that closely match the results of the full European simulation, while delivering substantial runtime reductions (up to ~500x). These results indicate that ML-based flow surrogates can provide decision-relevant interconnector flows for tractable reduced studies in high-renewable systems.

URL PDF HTML ☆

赞 0 踩 0

2606.03464 2026-06-03 cs.DC

Predicting Lakehouse Performance in Clouds: An Empirical Exploration of Query Runtime Variance

预测云中Lakehouse性能：查询运行时方差的实证探索

James Nurdin, Wei Liu, Richard Mccreadie, Lauritz Thamsen

AI总结本文通过跨云实验量化分布式Lakehouse查询的运行时方差，分析其来源（如数据局部性、共租户负载和缓存效应），并评估方差对查询性能预测模型及低碳调度等负载管理技术的影响。

Comments 11 pages, 5 figures, to appear in the Proceedings of the 19th IEEE International Conference on Cloud Computing (CLOUD)

详情

AI中文摘要

数据分析越来越多地运行在分布式lakehouse系统上，平台运营商必须优化货币、资源和环境成本。查询性能预测（QPP）有助于平衡这些成本，并支持自适应资源缩放和低碳调度等工作负载管理技术。然而，lakehouse中的运行时可能存在显著差异，且运行时方差对QPP准确性和工作负载编排的影响此前尚未针对lakehouse系统进行系统研究。本文通过研究分布式lakehouse分析查询观察到的运行时方差及其对QPP的影响来填补这一空白。首先，我们使用Kubernetes部署在三个公共云和一个私有云上，跨多个数据库规模和三个分析基准，量化了运行间方差。结果表明，同一查询的重复执行在运行时上可能相差近两倍。其次，我们进行了因子分析研究，评估了运行时方差的关键来源，如数据局部性、共租户负载和缓存效应。第三，我们研究了方差如何影响最先进的QPP模型，揭示解决方差的关键来源可以将预测误差降低高达80%。最后，我们展示了作为依赖性能预测的工作负载管理技术示例的低碳调度的下游影响，表明考虑运行时方差可以显著降低碳成本。

英文摘要

Data analytics increasingly runs on distributed lakehouse systems, where platform operators must optimise monetary, resource, and environmental costs. Query Performance Prediction (QPP) helps to balance these costs and supports workload management techniques, such as adaptive resource scaling and low-carbon scheduling. However, runtimes in lakehouses can vary substantially, and the impact of runtime variance on QPP accuracy and workload orchestration has not previously been systematically studied for lakehouse systems. This paper addresses this gap by investigating the runtime variance observed for distributed lakehouse analytical queries and its impact on QPP. First, we quantify the run-to-run variance using Kubernetes deployments across three public clouds and one private cloud, spanning multiple database scales and three analytical benchmarks. Our results demonstrate that repeated executions of the same query can vary in runtime by nearly twofold. Second, we conduct a factor analysis study assessing key sources of this runtime variance such as data locality, co-tenant load, and caching effects. Third, we examine how variance influences state-of-the-art QPP models, revealing that addressing key sources of variance can reduce prediction error up to 80%. Finally, we demonstrate the downstream implications for low-carbon scheduling as an example of a workload management technique that relies on performance prediction, showing that accounting for runtime variance can lead to a significant reduction in carbon costs.

URL PDF HTML ☆

赞 0 踩 0

2606.03434 2026-06-03 cs.CR

Signals and Spoils: Speculative Oracle Extractable Value in the Era of Cross-Chain Interoperability

信号与战利品：跨链互操作时代的投机性预言机可提取价值

Hasret Ozan Sevim, Christof Ferreira Torres

AI总结研究跨链环境下投机性预言机可提取价值（OEV）的检测与利用，提出检测方法并在Arbitrum、Base、Optimism上发现大量投机清算，揭示预言机更新延迟可被用于跨链OEV提取。

详情

AI中文摘要

一种新的最大可提取价值（MEV）形式，称为投机性MEV，已在Layer-2区块链上出现。与以太坊主网不同，许多Layer-2系统缺乏公共mempool，迫使提取策略变得概率性：搜索者发送多个相同交易，希望率先捕获机会。这产生了大量交易垃圾，增加了费用并浪费了区块空间。我们研究了投机性预言机可提取价值（OEV），这是一种通过投机性回溯预言机价格更新来清算抵押不足贷款的MEV形式。我们提出了一种在野检测投机性清算的方法，并将其应用于Arbitrum、Base和Optimism。在2025年10月10日，我们在Aave上识别出64个投机性清算者（占所有检测到的清算者的57%）和831次成功的投机性清算（占三条链上所有成功清算的39%）。我们进一步研究了跨链预言机价格更新延迟差异是否可以被利用进行跨链OEV。具体来说，我们询问搜索者是否可以观察一条链上的预言机更新，并在另一条链上抢先清算机会。我们系统分析了Arbitrum、Base、Ethereum和Optimism上的Chainlink去中心化预言机网络（DON）配置（偏差阈值、心跳间隔和提交的价格观测）。我们的数据集包括63个Chainlink喂价、12,009次价格更新和超过100,000次预言机观测，与2,986次Aave清算相关。我们表明，独立的DON几乎同时消耗基本相同的链下价格数据，但在不同时间发布更新，从而创建了统计上可预测的跨链利用窗口。我们证明，Optimism上的Chainlink更新可以预测Arbitrum和Base上的后续更新，从而实现投机性跨链OEV提取。

英文摘要

A new form of Maximal Extractable Value (MEV), termed speculative MEV, has emerged across Layer-2 blockchains. Unlike Ethereum mainnet, many Layer-2 systems lack a public mempool, forcing extraction strategies to become probabilistic: searchers emit multiple identical transactions hoping to capture an opportunity first. This generates substantial transaction spam, increasing fees and wasting block space. We investigate speculative Oracle Extractable Value (OEV), a form of MEV associated with liquidating undercollateralized loans via speculative backrunning of oracle price updates. We propose a methodology for detecting speculative liquidations in the wild and apply it across Arbitrum, Base, and Optimism. On October 10, 2025, we identify 64 speculative liquidators on Aave (57% of all detected liquidators) and 831 successful speculative liquidations (39% of all successful liquidations across the three chains). We further examine whether latency differences in oracle price feed updates across blockchains can be exploited for cross-chain OEV. Specifically, we ask whether a searcher can observe oracle updates on one chain and frontrun liquidation opportunities on another. We systematically analyze Chainlink Decentralized Oracle Network (DON) configurations (deviation thresholds, heartbeat intervals, and submitted price observations) across Arbitrum, Base, Ethereum, and Optimism. Our dataset comprises 63 Chainlink feeds, 12,009 price updates, and over 100,000 oracle observations linked to 2,986 Aave liquidations. We show that independent DONs consume largely identical off-chain price data nearly simultaneously yet publish updates at different times, creating statistically predictable cross-chain exploitation windows. We demonstrate that Chainlink updates on Optimism can predict subsequent updates on Arbitrum and Base, enabling speculative cross-chain OEV extraction.

URL PDF HTML ☆

赞 0 踩 0

2606.03427 2026-06-03 cs.SE

Multi-Modal Assessment of Road Roughness Using Smartphone Applications, Acceleration, and Passenger Ratings

使用智能手机应用、加速度和乘客评分的多模态道路平整度评估

Novel Certada, Amirhesam Aghanouri, Joseba Gorospe, Cristina Olaverri-Monreal

AI总结提出一种基于智能手机IRI估计、车载GNSS-IMU测量和乘客PSR评分的低成本多模态道路平整度评估框架，验证了数据间的相关性及感知与物理指标的关联。

Comments 6 pages

详情

AI中文摘要

本文研究了一种多模态且以人为中心的低成本道路平整度评估框架。评估基于三个互补的数据源：来自两个独立智能手机应用的基于智能手机的国际平整度指数（IRI）估计；车载GNSS-IMU接收器（全球导航卫星系统接收器与惯性测量单元）测量值；以及乘客的现时服务能力评级（PSR）。数据在奥地利、匈牙利和罗马尼亚的真实交通条件下收集，总里程超过1700公里。使用相关性分析、组内相关系数（ICC）和Bland-Altman方法评估了应用间的一致性。虽然两个智能手机应用显示出强相关性，但系统偏差限制了它们的互换性。IRI与PSR之间的显著负相关证实了对平整度的感知敏感性，而IRI与垂直加速度之间的正相关验证了路面不平整与车辆动力学之间的物理联系。结果证明了将消费级传感和基于感知的评估整合到道路平整度监测中作为高成本专业测量设备替代方案的挑战。

英文摘要

This paper investigates a multi-modal and human-centric framework for low-cost road roughness assessment. The evaluation was based on three complementary data sources: smartphone-based International Roughness Index (IRI) estimates from two independent smartphone-based applications; in-vehicle GNSS-IMU Receiver (Global Navigation Satellite System Receiver with Inertial Measurement Unit) measurements, and passenger Present Serviceability Ratings (PSR). Data were collected over 1700 km across Austria, Hungary, and Romania under real traffic conditions. Inter-application agreement was evaluated using correlation analysis, Intraclass Correlation Coefficient (ICC), and Bland-Altman methods. While the two smartphone applications show strong correlation, systematic bias limits their interchangeability. A significant inverse relationship between IRI and PSR confirms perceptual sensitivity to roughness, and positive correlations between IRI and vertical acceleration validate the physical linkage between pavement irregularities and vehicle dynamics. The results demonstrate the challenges of integrating consumer-grade sensing and perception-based evaluation for road roughness monitoring as an alternative to high-cost specialized survey equipment.

URL PDF HTML ☆

赞 0 踩 0

2606.03422 2026-06-03 cs.CE

HonestAffinity: Leak-Aware Evaluation of Protein and Pocket Priors for Binding Affinity Prediction

HonestAffinity: 用于结合亲和力预测的蛋白质和口袋先验的泄漏感知评估

Junhao Wei, Baili Lu, Zhenhong Peng, Wanyan Li, Zhirong Huang, Yanxiao Li, Yifu Zhao, Dexing Yao, Haochen Li, Xudong Ye, Sio-Kei Im, Yapeng Wang, Xu Yang

AI总结提出HonestAffinity，一种紧凑的一维输入预测器，在泄漏感知协议下评估冻结的ESM-2蛋白质嵌入和学习的二进制口袋位置标记两种先验，发现先验效果依赖于数据划分方式。

详情

AI中文摘要

基于序列的深度学习为蛋白质-配体结合亲和力预测提供了一种可扩展的替代结构评分的方法。然而，当架构先验在规范的PDBbind风格划分上进行评估时，进展难以解释，因为这种划分会泄漏跨折叠的相似性类别。我们提出HonestAffinity，一种紧凑的一维输入预测器，在泄漏感知协议下隔离两种先验：冻结的ESM-2（650M）蛋白质嵌入和学习的二进制口袋位置标记。我们在三种变体中评估多尺度卷积/Transformer模板：HonestAffinity-Pocket、HonestAffinity-NoPocket和HonestAffinity-Pocket-NoESM。所有三种变体在约3 GPU小时内对11,513个LP-PDBBind复合物进行训练。我们在LP-PDBBind三级无泄漏保留集、CASF-2016和CASF-2016非训练子集上对五个基线进行基准测试。我们的核心发现是存在一种依赖于划分的逆转，而不是统一的先验最优：HonestAffinity-Pocket在验证集和CASF-2016划分上取得最佳平均皮尔逊R，而HonestAffinity-Pocket-NoESM在每个严格的LP无泄漏层级（test_cl1-cl3）上取得最佳平均皮尔逊R。口袋标记和ESM-2输入在熟悉的划分上提升性能，但在严格的无泄漏层级上降低皮尔逊R。我们认为模型应报告配对的规范性和防泄漏消融实验，并且部署场景匹配的变体比单一默认设置更能描述这些逆转。代码和脚本在脚注中链接；检查点将在接收后发布。

英文摘要

Sequence-based deep learning offers a scalable alternative to structure-based scoring for protein-ligand binding affinity prediction. However, progress is hard to interpret when architectural priors are evaluated on canonical PDBbind-style splits that leak similarity classes across folds. We present HonestAffinity, a compact 1D-input predictor to isolate two priors under a leak-aware protocol: frozen ESM-2 (650M) protein embeddings and a learned binary pocket-position marker. We evaluate a multi-scale convolutional/Transformer template in three variants: HonestAffinity-Pocket, HonestAffinity-NoPocket, and HonestAffinity-Pocket-NoESM. All three train on 11,513 LP-PDBBind complexes in ~3 GPU-hours. We benchmark against five baselines on the LP-PDBBind 3-tier no-leak hold-out, CASF-2016, and a CASF-2016 non-train subset. Our central finding is a split-conditioned reversal rather than a uniformly best prior: HonestAffinity-Pocket achieves the best mean Pearson R on validation and CASF-2016 splits, whereas HonestAffinity-Pocket-NoESM achieves the best mean Pearson R on every strict LP no-leak tier (test_cl1-cl3). Both the pocket marker and ESM-2 input improve performance on familiar splits but reduce Pearson R on strict no-leak tiers. We argue models should report paired canonical and leak-proof ablations, and that deployment-regime-matched variants better describe these reversals than a single default. Code and scripts are linked in the footnote; checkpoints will be released upon acceptance.

URL PDF HTML ☆

赞 0 踩 0

2606.03416 2026-06-03 cs.MA

MeDxAgent: Multi-Agent Consultation for Interactive Medical Diagnosis

MeDxAgent: 交互式医疗诊断的多智能体咨询系统

Akshat Sanghvi, Naren Akash, Raza Imam, Amit Sharma, Mohit Jain

AI总结针对现有LLM诊断评估与临床实践脱节的问题，提出多智能体咨询系统MeDxAgent和基准MeDxBench，通过交互式诊断流程将准确率提升10.3%。

Comments 28 pages, 6 figures

详情

AI中文摘要

大型语言模型（LLMs）越来越多地用于健康相关的决策支持。然而，大多数评估将诊断视为一次性任务，预先提供完整信息，通常以多项选择形式进行。这与临床实践不同，临床诊断是交互式且开放式的，涉及通过针对性提问逐步细化假设。我们解决了这一差距。我们构建了MeDxBench，一个包含20个专科、4,421个临床病例的大规模基准。我们进一步提出了MeDxAgent，一个用于交互式诊断的多智能体咨询系统，并系统研究了其在提示、流程和智能体层面的设计选择。MeDxAgent在MeDxBench上相比基线获得了10.3%的准确率提升，缩小了与全信息基准之间52.3%的差距。我们发现，特定的设计选择——先收集人口统计信息、传递总结后的对话用于诊断、以及提供候选诊断以进行针对性提问——能提高准确率，这反映了医生的推理方式，尽管其效果只有在组合使用时才完全显现。代码和数据集将在发表后发布。

英文摘要

Large language models (LLMs) are increasingly used for health-related decision support. Yet most evaluations treat diagnosis as a single-shot task with complete information provided upfront, often as a multiple-choice selection. This diverges from clinical practice, where diagnosis is interactive and open-ended, involving sequential hypothesis refinement through targeted questioning. We address this gap. We build MeDxBench, a large-scale benchmark of 4,421 clinical cases across 20 specialties. We further propose MeDxAgent, a multi-agent consultation system for interactive diagnosis, and systematically study its prompt-, flow- and agent-level design choices. MeDxAgent achieves a 10.3% accuracy gain over the baseline on MeDxBench, closing 52.3% of the gap to a full-information oracle. We find that specific design choices: collecting demographics first, passing summarized dialogue for diagnosis, and feeding candidate diagnoses for targeted questioning, improve accuracy, mirroring how physicians reason, though their effect emerges fully only in combination. Code and dataset will be released upon publication.

URL PDF HTML ☆

赞 0 踩 0

2606.03413 2026-06-03 cs.LO math.LO

Non-Wellfounded and Cyclic Proofs for LTL: A Syntactic Correspondence with Linear Nested Sequents

LTL的非良基与循环证明：与线性嵌套相继式的句法对应

Tim S. Lyon, Lukas Zenger

AI总结本文针对线性时序逻辑（LTL），引入并研究非良基和循环线性嵌套相继式演算，通过饱和递归和规则前移技术，解决了从非良基证明中提取循环证明（循环识别）以及反向转换（展开）两个核心问题。

Comments In Review

详情

AI中文摘要

我们引入并研究了非良基和循环线性嵌套相继式演算，并以线性时序逻辑（LTL）为案例开发了此类系统。本文解决了两个核心问题，即“循环识别”和“展开”。循环识别涉及识别非良基证明中的循环，以提取相应的循环证明；而展开研究反向转换，即从循环证明到非良基证明。尽管这些过程在Gentzen相继式中已被充分理解，但对于更具表达力的相继式形式化方法却鲜有关注，并且在线性嵌套相继式设置中变得更加具有挑战性。为了解决循环识别，我们证明了非良基证明相对于一种特定范式的完备性，该范式具有我们称为“饱和递归”的性质，从而能够系统地提取循环证明。为了解决展开，我们引入了一种专门的过程，将规则应用沿线性嵌套相继式向前移动，从而允许从循环证明重构非良基证明。总体而言，我们的工作为表达性多相继式形式化方法中的循环识别和展开提供了新的证明论技术。

英文摘要

We introduce and investigate non-wellfounded and cyclic linear nested sequent calculi, and, as a case study, develop such systems for linear temporal logic (LTL). The paper addresses two central problems, which we call 'cycle recognition' and 'unraveling.' Cycle recognition concerns identifying cycles in non-wellfounded proofs in order to extract corresponding cyclic proofs, while unraveling studies the converse transformation, from cyclic proofs to non-wellfounded ones. Although these processes are well understood for Gentzen sequents, they have received little attention for more expressive sequent formalisms and become more challenging in the linear nested sequent setting. To address cycle recognition, we show the completeness of non-wellfounded proofs relative to a particular normal form exhibiting a property we call 'saturation recurrence,' which enables the systematic extraction of cyclic proofs. To address unraveling, we introduce a specialized procedure that shifts rule applications forward along linear nested sequents, allowing non-wellfounded proofs to be reconstructed from cyclic ones. Overall, our work provides new proof-theoretic techniques for cycle recognition and unraveling in expressive multisequent formalisms.

URL PDF HTML ☆

赞 0 踩 0

2606.03394 2026-06-03 cs.SE

Human-AI Collaboration and the Transformation of Software Engineering Work

人机协作与软件工程工作的转型

Mamdouh Alenezi

AI总结本文通过结构化综合分析方法，研究了生成式AI和智能体AI如何将软件工程从以人类编写代码为中心转变为以指导、验证和管理自主及半自主系统为中心，并提出了一个包含技术、认知、社会技术、治理和组织五个维度的未来工程师能力框架。

详情

AI中文摘要

生成式AI（GenAI）和智能体AI（Agentic AI）融入软件开发，正在将软件工程从以人类编写代码为中心的活动，重新配置为以指导、验证和管理自主及半自主系统为中心的学科。基于近期同行评审和档案研究的精选多源证据库——包括对自主编码代理向开源仓库贡献数十万拉取请求的大规模实证观察——本文综合了工程工作的重心如何从个人编码生产力转向人机协作、代理编排、验证与确认、治理以及社会技术系统思维。我们采用结构化解释性综合方法，刻画了三种共存范式：传统软件工程、生成式AI赋能软件工程和智能体AI赋能软件工程。我们映射了哪些传统活动正在被自动化、哪些正在被增强、哪些是新出现的，并追踪了未来十年可能的角色轨迹。本文的主要贡献是一个原创的、理论驱动的能力框架，将未来工程师所需的能力组织为五个相互作用的类别——技术、认知、社会技术、治理和组织——通过一个能力矩阵和一个将范式转变与能力需求联系起来的转型框架来操作化。我们推导出九个可实证检验的命题，并阐述了对理论、行业劳动力转型、大学课程和组织领导力的启示。我们认为，随着代码变得丰富，软件工程师的持久价值越来越在于意图规范、批判性判断和负责任的监督，而非所产生代码的数量。

英文摘要

The integration of Generative AI (GenAI) and Agentic AI into software development is reconfiguring software engineering from an activity centered on human authorship of code into a discipline centered on directing, verifying, and governing autonomous and semi-autonomous systems. Drawing on a curated, multi-source evidence base of recent peer-reviewed and archival studies -- including large-scale empirical observations of autonomous coding agents contributing hundreds of thousands of pull requests to open-source repositories -- this paper synthesizes how the locus of engineering work is shifting from individual coding productivity toward human--AI collaboration, agent orchestration, verification and validation, governance, and socio-technical systems thinking. We adopt a structured interpretive synthesis to characterize three coexisting paradigms: Traditional, Generative AI-Enabled, and Agentic AI-Enabled software engineering. We map which traditional activities are being automated, which are being augmented, and which are newly emerging, and we trace plausible role trajectories over the next decade. The paper's principal contribution is an original, theory-driven competency framework that organizes the capabilities required of future engineers into five interacting categories -- % technical, cognitive, socio-technical, governance, and organizational -- % operationalized through a competency matrix and a transformation framework linking paradigm shifts to capability demands. We derive nine empirically testable propositions and articulate implications for theory, industry workforce transformation, university curricula, and organizational leadership. We argue that, as code becomes abundant, the durable value of the software engineer increasingly resides in intent specification, critical judgment, and accountable oversight rather than in the sheer volume of code produced.

URL PDF HTML ☆

赞 0 踩 0

2606.03387 2026-06-03 cs.CR

Bastet: A Fine-Grained Expert-Labeled Dataset for DeFi Smart Contract Vulnerability Detection

Bastet：用于DeFi智能合约漏洞检测的细粒度专家标注数据集

Wan-Hsuan Hsu, Wei-Hsin Wang, Cheng-Yu Liou, Ting-Rui Ke, Kentaroh Toyoda

AI总结针对现有数据集存在的过时Solidity版本、自动标注噪声和粗粒度标签问题，提出基于真实审计报告和专家共识标注的细粒度两层分类法（46个标签和77个子标签）的DeFi智能合约漏洞数据集Bastet。

详情

AI中文摘要

去中心化金融（DeFi）协议中的智能合约漏洞仅在2024年就导致超过14.9亿美元的确认损失，涉及192起事件[1]。随着基于LLM的漏洞检测成为应对这些威胁的有前景的方法，评估数据集的质量已成为关键瓶颈。现有数据集存在三个根本问题：它们基于过时的Solidity版本（例如v0.4），不再反映现代DeFi合约[5][6][7]；它们依赖自动或LLM生成的注释，引入了幻觉驱动的标签噪声[9][10]；并且它们应用粗糙的单层标签，未能捕捉真实世界业务逻辑漏洞的语义复杂性[6][7][11][12]。我们提出Bastet，一个专家标注的DeFi智能合约漏洞数据集，通过真实世界的审计发现（2021-2024）、基于讨论共识的人工专家注释以及包含46个标签和77个子标签的两层分类法，解决了所有三个问题。Bastet包含从394份Code4rena竞争性审计报告中收集的4,402个发现，时间跨度从2021年4月到2024年11月，其中849个发现由DeFiHackLabs社区的白帽安全研究人员完全注释。所有注释均通过双注释者共识工作流程生成，确保基于真实世界漏洞根本原因的标签准确性。

英文摘要

Smart contract vulnerabilities in Decentralized Finance (DeFi) protocols resulted in over 1.49 billion USD in confirmed losses in 2024 alone, across 192 incidents [1]. As LLM-based vulnerability detection emerges as a promising approach to address these threats, the quality of evaluation datasets has become a critical bottleneck. Existing datasets suffer from three fundamental problems: they are built on outdated Solidity versions (e.g., v0.4) that no longer reflect modern DeFi contracts [5][6][7]; they rely on automated or LLM-generated annotations that introduce hallucination-driven label noise [9][10]; and they apply coarse single-layer labeling that fails to capture the semantic complexity of real-world business logic vulnerabilities [6][7][11][12]. We present Bastet, an expert-labeled DeFi smart contract vulnerability dataset that addresses all three problems through real-world audit findings (2021-2024), human expert annotation with discussion-based consensus, and a two-layer taxonomy of 46 Tags and 77 Subtags. Bastet comprises 4,402 findings collected from 394 Code4rena competitive audit reports spanning April 2021 to November 2024, of which 849 findings are fully annotated by white-hat security researchers from the DeFiHackLabs community. All annotations are produced through a two-annotator consensus workflow, ensuring label accuracy grounded in real-world vulnerability root causes.

URL PDF HTML ☆

赞 0 踩 0

2606.03386 2026-06-03 cs.CR

Operationalizing Cyber Attack Prediction: A Gap-Prioritized Framework with Dataset and Model Selection Guidelines

网络攻击预测的操作化：一个带有数据集和模型选择指南的差距优先级框架

Aminu Muhammad Auwal

AI总结本文提出一个差距优先级框架，通过分析150+基准数据集和200+研究，识别并优先处理五个实施障碍，并提供数据集质量评估框架和实用路线图，以弥合网络攻击预测研究与实践之间的差距。

详情

AI中文摘要

尽管用于网络攻击预测的AI和机器学习已经取得进展，但理论研究与实际操作部署之间仍存在关键差距。基于Ankalaki等人（2025）的工作，本文对150多个基准数据集和200多项研究进行了全面分析，以识别并优先处理五个实施障碍：（1）时间数据集过时，（2）攻击范围狭窄，（3）实时模型可解释性，（4）对抗鲁棒性不足，以及（5）隐私/伦理问题。我们引入了一个新颖的差距优先级框架，根据检测影响、实施成本和修复时间评估这些限制。我们的分析将数据集过时和对抗鲁棒性确定为最高优先级的差距，同时强调模型可解释性是资源受限环境中成本效益最高的路径。为了弥合研究与实践之间的鸿沟，我们提供了一个实用的实施路线图和一个数据集质量评估框架，将45个基准分类为生产就绪、仅限研究和不可用三类。这项工作将学术发现转化为可操作的决策支持工具，用于健壮的、面向生产的AI驱动网络防御。

英文摘要

While AI and machine learning for cyber attack prediction have advanced, a critical gap persists between theoretical research and practical operational deployment. Building on Ankalaki et al. (2025), this paper provides a comprehensive analysis of 150+ benchmark datasets and 200+ studies to identify and prioritize five implementation hurdles: (1) temporal dataset obsolescence, (2) narrow attack scope, (3) real-time model interpretability, (4) inadequate adversarial robustness, and (5) privacy/ethical concerns. We introduce a novel gap-prioritization framework that evaluates these limitations based on detection impact, implementation cost, and remediation time. Our analysis identifies dataset obsolescence and adversarial robustness as the highest-priority gaps, while highlighting model interpretability as the most cost-effective path for resource-constrained environments. To bridge the research-practice divide, we provide a practical implementation roadmap and a dataset quality assessment framework that classifies 45 benchmarks into production-ready, research-only, and unusable categories. This work translates academic findings into actionable decision-support tools for robust, production-oriented AI-driven cyber defense.

URL PDF HTML ☆

赞 0 踩 0

2606.03378 2026-06-03 cs.SE

Neural Change Prediction: Relating Software Changes to Their Effects and Vice Versa

神经变化预测：将软件变更与其效果关联及反之

Laura Plein, Souhila Zidane, Jordan Samhi, Andreas Zeller

AI总结提出神经变化预测技术，通过自动变异代码并观察行为变化，学习并预测软件变更与动态效果之间的关联，支持特征定位、软件演化和修复。

详情

AI中文摘要

软件开发的大部分工作围绕着理解软件变更与其效果之间的关系。如果我们能够学习并预测这些关系，这样的预测可以惠及软件工程的多个领域。尽管人工智能的最新进展在软件工程任务中显示出巨大潜力，但在不执行代码的情况下预测代码的语义仍然是一个重大挑战。在本文中，我们提出了神经变化预测，这是一种新颖且基础的技术，用于学习和预测软件变更与其对程序行为的动态影响之间的关联。具体来说，对于给定的程序和测试输入，我们自动对代码应用大量变异，并观察这些变化如何改变程序的输出。从这些（软件变更，行为变化）对中，我们创建模型：（1）对于期望的行为变化，预测代码应在何处以及如何更改（特征定位、软件演化和软件修复）；（2）对于给定的代码变更，预测该代码变更如何影响输出（效果预测）。我们在CSS配置文件上进行了详细的案例研究，并在Python程序上进行了评估，以证明神经变化预测的通用性和广泛适用性。虽然神经变化预测需要大量变异（因此需要大量执行被测程序），但神经变化预测是完全自动的，并且不需要任何关于代码或其语义的先验知识，使其适用于任何可执行且输出可观察的软件工件。

英文摘要

Much of software development revolves around understanding the relationship between software changes and their effects. If we could learn and predict those relationships, such predictions could benefit several areas of software engineering. While recent advances in artificial intelligence have shown great promise in software engineering tasks, predicting the semantics of code without executing it remains a big challenge. In this paper, we present Neural Change Prediction, a novel and fundamental technique to learn and predict associations between software changes and their dynamic effects on program behavior. Specifically, for a given program and test inputs, we automatically apply numerous mutations to the code and observe how these changes alter the program's output. From these (changes to software, changes in behavior)-pairs, we create models that: (1) for a desired change in behavior, predict where and how the code should be changed (feature localization, software evolution, and software repair); and (2) for a given code change, predict how this code change affects the output (effect prediction). We have conducted a detailed case study on CSS configuration files and an evaluation on Python programs to demonstrate the generality and wide applicability of Neural Change Prediction. While Neural Change Prediction requires numerous mutations (and thus numerous executions of the program under test), Neural Change Prediction is fully automatic and does not require any prior knowledge of the code or its semantics, making it applicable to any software artifact that can be executed and whose output can be observed.

URL PDF HTML ☆

赞 0 踩 0

2606.03369 2026-06-03 cs.LO math.CT math.LO

A calculus of types in Isbell nuclei

Isbell 核中的类型演算

Juan Luis Gastaldi, Samantha Jarvis, Thomas Seiller, John Terilla

AI总结本文证明线性逻辑中的双正交闭包与富化Isbell对偶中的核构造在给定最小数据（执行乘积和实值度量）下产生相同对象，并由此导出非交换Lambek演算。

详情

AI中文摘要

我们识别了来自不同数学传统的两种构造。在线性逻辑和实现性中，逻辑类型是生成的而非预先固定的：从一个配备执行（execution）的实现者宇宙开始，使用正交性测试它们的交互，并将类型取为双正交闭子集。在富化Isbell对偶中，一个定量关系诱导一个伴随，其不动点构成一个范畴，即它的核。这些构造通过不同方式进行；我们证明，在当前设定下，它们产生相同的对象。共享的数据是最小的：一个称为执行的结合性乘积，以及一个实值度量，两者之间不假设任何兼容性。度量非加性这一点同时是定义正交性的关系以及我们形成其Isbell核的定量关系，而由正交性切割出的类型恰好是相关伴随的不动点。这一识别在双向都有收益。最自然的类型乘积不满足结合性；修复这一缺陷迫使采用不同的类型概念，对复合的两侧都敏感，使得诱导的乘积是结合的，并且当执行有单位时，携带两个余项。由此产生的是一个非交换的Lambek演算，直接从执行和正交性导出而非强加。在反向方向上，每个这样的类型，从范畴侧解读，生成它自己的定量关系，并随之产生一个导出的伴随和进一步的类型生成；这些导出的类型再次是原始情况的类型，由Lambek演算的余项计算。我们还证明了这一构造的三重排列的相干定理，并在有限维情形下给出了乘积的显式公式。

英文摘要

We identify two constructions from different mathematical traditions. In linear logic and realisability, logical types are generated rather than fixed in advance: one begins with a universe of realisers equipped with execution, uses orthogonality to test their interactions, and takes types to be the biorthogonally closed subsets. In enriched Isbell duality, a quantitative relation induces an adjunction whose fixed points form a category, its nucleus. These constructions proceed by different means; we show that, in the present setting, they produce the same objects. The shared datum is minimal: an associative product, called execution, and a real-valued measurement, with no compatibility assumed between them. The failure of the measurement to be additive is at once the relation defining orthogonality and the quantitative relation whose Isbell nucleus we form, and the types cut out by orthogonality are exactly the fixed points of the associated adjunction. The identification pays off in both directions. The most natural product of types fails to be associative; repairing this failure forces a different notion of type, sensitive to both sides of a composite, on which the induced product is associative and, when execution has units, carries two residuals. What emerges is a noncommutative Lambek calculus, derived directly from execution and orthogonality rather than imposed. In the reverse direction, each such type, read on the categorical side, generates a quantitative relation of its own, and with it a derived adjunction and a further generation of types; these derived types are again types of the original situation, computed by the residuals of the Lambek calculus. We also prove a coherence theorem for the threefold arrangements of this construction and, in the finite-dimensional case, give explicit formulas for the product.

URL PDF HTML ☆

赞 0 踩 0

2606.03367 2026-06-03 cs.IR

Automating Information Extraction and Retrieval for Industrial Spare Parts Pooling

自动化信息提取与检索用于工业备件池化

Dyuman Bulloni, Rocco Felici, Oliver Avram, Anna Valente

AI总结提出PhRAG混合检索增强生成框架，通过命名实体识别结构化异构备件描述并构建虚拟库存池，结合生成式语言模型处理数据稀缺和查询变异性，实现可解释的备件检索。

详情

AI中文摘要

制造业的维护组织试图通过重用现有资产来避免停机和不必要的采购，但主要障碍不是缺乏零件，而是缺乏跨站点和合作伙伴的可操作可见性。库存分布广泛，描述命名约定不一致，包含重复和部分指定的引用，因此正确的零件通常存在于某处，但实际无法发现。本文提出PhRAG，一种混合检索增强生成方法，将这种碎片化景观池化为一个虚拟库存池（VSPool），可以作为一个单一资源进行结构化和搜索。非结构化的异构备件描述通过命名实体识别（NER）结构化到一个共享的虚拟池数据集中，并进行索引以支持稳健的检索，即使用户以自然语言而非精确技术规格表达需求。所提出的模块化流水线利用生成语言模型的多任务特性，覆盖了使工业备件池化具有挑战性的两个维度：（i）来自不同数据源（例如新合作伙伴、目录、市场列表）的非结构化技术规格通过离线提取处理；（ii）运行时的请求变异性（引用、部分引用、规格、价格/条件约束）通过基于混合RAG的搜索引擎处理，该引擎能够检索相关组件并证明结果。该框架展示了在技术规格提取数据稀缺情况下，生成方法相比传统NER方法的潜力，并通过为检索到的组件生成理由，克服了标准信息检索系统的不透明性。项目的开源代码可在此https URL找到。

英文摘要

Maintenance organizations in manufacturing try to avoid downtime and unnecessary purchasing by reusing existing assets, but the main obstacle is not a lack of parts but a lack of actionable visibility across sites and partners. Inventories are distributed, described with inconsistent naming conventions, and contain duplicates and partially specified references, so the right part often exists somewhere but remains effectively undiscoverable. The paper proposes PhRAG, a hybrid Retrieval-Augmented Generation for Pooling this fragmented landscape into a Virtual Stock Pool (VSPool) that can be structured and searched as a single resource. Unstructured, heterogeneous spare part descriptions are structured via Named Entity Recognition (NER) into a shared virtual pool dataset and indexed to support robust retrieval even when users express needs in natural language rather than exact technical specifications. The proposed modular pipeline leverages the multitasking nature of generative language models to cover two dimensions that make industrial parts pooling challenging: (i) unstructured technical specifications from diverse data sources (e.g. new partners, catalogs, marketplace listings) are handled through an offline extraction and (ii) request variability at runtime (references, partial references, specifications, price/condition constraints) is handled through a hybrid RAG-based search engine capable of retrieving relevant components and justifying results. The framework demonstrates the potential of generative approaches compared with traditional NER approaches in the presence of data scarcity for technical specifications extraction and overcomes the opacity of standard information retrieval systems by generating justifications for retrieved components. The project's open-source code can be found at https://github.com/roccofelici/vspool.

URL PDF HTML ☆

赞 0 踩 0

2606.03364 2026-06-03 cs.DC cs.DB cs.PF cs.SE

BlobShuffle: Cost-Effective Repartitioning in Stream Processing Systems via Object Storage Exemplified with Kafka Streams

BlobShuffle：通过对象存储实现流处理系统中经济高效的重分区——以Kafka Streams为例

Sören Henning, Otmar Ertl, Adriano Vogel

AI总结提出BlobShuffle方法，利用云对象存储作为中间交换层，通过批量存储和紧凑通知机制降低流处理系统中跨可用区数据重排的网络成本，实验表明相比原生Kafka Streams可减少40倍以上成本且延迟低于2秒。

详情

AI中文摘要

数据流的洗牌或重分区是现代流处理框架在分布式大规模环境下支持有状态工作负载的关键操作。然而，在当今的云部署中，由于跨多个可用区（AZ）的大量网络流量以及在高吞吐、强一致性消息骨干网大规模运行时的运维负担，洗牌可能成为主要的成本驱动因素。我们提出BlobShuffle，一种新颖的流处理系统经济高效洗牌方法，它利用云对象存储作为中间交换层。BlobShuffle不是直接发送所有洗牌记录，而是将记录分组为批次，将这些批次存储在云对象存储中，并仅转发紧凑的通知。下游操作员使用这些通知检索相关批次并提取相应记录。BlobShuffle通过可配置的批处理和分布式缓存机制平衡成本效率和延迟。BlobShuffle作为Kafka Streams的附加组件实现，仅需对现有应用程序进行最少的代码更改，不修改Kafka和底层基础设施，并保留Kafka Streams的一致性和正确性保证。在基于Kubernetes的AWS部署上进行的大规模实验评估中，我们表明BlobShuffle可以将洗牌成本降低到原生Kafka Streams洗牌的40倍以上，同时将95百分位洗牌延迟保持在2秒以下。此外，它能够扩展到处理超过2 GiB/s的数据，而在我们的实验中未遇到可扩展性限制，表明BlobShuffle可以在大规模下经济地支持洗牌密集型工作负载。

英文摘要

Shuffling or repartitioning data streams is an essential operation of state-of-the-art stream processing frameworks to support stateful workloads in a large-scale, distributed setting. In today's cloud deployments, however, shuffling can become a major cost driver due to substantial network traffic across multiple availability zones (AZs) as well as an operational burden when operating a high-throughput, strongly consistent messaging backbone at scale. We present BlobShuffle, a novel approach to cost-effective shuffling for stream processing systems that leverages cloud object storage as an intermediate exchange layer. Instead of sending all shuffled records directly, BlobShuffle groups records into batches, stores these batches in cloud object storage, and forwards only compact notifications. Downstream operators use these notifications to retrieve the relevant batches and extract the corresponding records. BlobShuffle balances cost efficiency and latency through configurable batching and a distributed caching mechanism. BlobShuffle is implemented as an add-on for Kafka Streams that requires only minimal code changes to existing applications, leaves Kafka and the underlying infrastructure unmodified, and preserves Kafka Streams' consistency and correctness guarantees. In a large-scale experimental evaluation on a Kubernetes-based AWS deployment, we show that BlobShuffle can reduce shuffling costs by more than 40x compared to native Kafka Streams shuffling while keeping the 95th percentile shuffle latency below 2 seconds. Moreover, it scales to processing more than 2 GiB/s without encountering a scalability limit in our experiments, indicating that BlobShuffle can economically support shuffle-intensive workloads at large scale.

URL PDF HTML ☆

赞 0 踩 0

2606.03362 2026-06-03 cs.DL

Emerging and established topics in drone research: Citation impact and knowledge flows across China, the United States, the EU, Ukraine, and Russia (2020-2025)

无人机研究中的新兴与成熟主题：中国、美国、欧盟、乌克兰和俄罗斯的引用影响与知识流动（2020-2025）

Myroslava Hladchenko

AI总结利用OpenAlex文献数据，研究2020-2025年间无人机研究的新兴与成熟主题，分析中国、美国、欧盟、乌克兰和俄罗斯之间的引用影响与知识流动，发现中国在科学产出和国内引用集中度上日益主导，而美国与欧盟保持更国际化的引用结构，且中国与美欧的引用交流增加。

详情

AI中文摘要

本研究利用OpenAlex文献数据，考察了2020-2025年间无人机研究中的新兴与成熟主题，重点关注中国、美国、欧盟、乌克兰和俄罗斯之间的引用影响与知识流动。结果显示，无人机相关科学呈现出科学产出、引用集中度和国际知识交流方面日益增长的地缘政治不对称性。特别是，中国在科学产出、作者贡献份额和国内引用循环方面日益占据主导地位。相比之下，美国和欧盟国家保持了相对更国际化的引用结构。然而，中国所属出版物越来越融入全球引用网络，特别是通过与美国和欧洲国家日益增长的引用交流。值得注意的是，作者身份和引用模式的解释因高比例未识别所属机构的出版物而复杂化，在弱信号主题中，2025年这一比例达到50%。这些发现强调了开发全面国家研究机构注册库（RORs）的重要性。尽管中国显示出引用优势，但这部分是由较高的国内引用集中度而非完全由全球整合驱动。此外，中国从欧盟14国和美国进口的知识比例仍高于其出口，且这种不对称性随时间增加。欧盟14国在弱信号主题中保持了最强的引用影响，表明其在塑造新兴研究方向方面发挥了更突出的作用。同时，在强信号和弱信号主题中，中国所属出版物引用美国的频率高于欧盟14国，这种模式在弱信号领域尤为明显。

英文摘要

This study examined emerging and established topics in drone research, focusing on citation impact and knowledge flows across China, the United States, the EU, Ukraine, and Russia between 2020 and 2025 using OpenAlex bibliographic data. The findings revealed that drone-related science is characterised by growing geopolitical asymmetries in scientific production, citation concentration, and international knowledge exchange. In particular, China increasingly dominated scientific production, fractional authorship contribution, and domestic citation circulation. In contrast, the United States and EU countries maintained comparatively more internationally distributed citation structures. However, China-affiliated publications became increasingly integrated into global citation networks, particularly through growing citation exchange with the United States and European countries. Notably, the interpretation of authorship and citation patterns was complicated by the high proportion of publications with unidentified affiliations, which reached 50% in 2025 within weak-signal topics. These findings underscore the importance of developing comprehensive national Research Organisation Registries (RORs). Although China demonstrated a citation advantage, this was partly driven by high internal domestic citation concentration rather than exclusively by global integration. Moreover, China still imported proportionally more knowledge from the EU-14 and the United States than it exported, with this asymmetry increasing over time. EU-14 countries maintained the strongest citation impact in weak-signal topics, suggesting a more prominent role in shaping emerging research directions. At the same time, China-affiliated publications cited the United States more frequently than the EU-14 in both strong- and weak-signal topics, with this pattern being particularly pronounced in weak-signal areas.

URL PDF HTML ☆

赞 0 踩 0

2606.03354 2026-06-03 cs.CR

ImageAuditor: Membership Inference Attack against Image-based Retrieval-Augmented Generation

ImageAuditor: 针对基于图像的检索增强生成系统的成员推理攻击

Jinghuai Zhang, Pengyue Yu, Zhexiao Lin, Kunlin Cai, Fnu Suya, Yuan Tian

AI总结提出首个针对基于图像的检索增强生成（IRAG）的成员推理攻击方法ImageAuditor，通过奖励引导策略优化解决跨模态检索和判别信号提取两大挑战，在仅需四次查询时AUROC超过80%。

详情

AI中文摘要

基于图像的检索增强生成（IRAG）将冻结的生成器条件于从外部数据库检索到的参考图像，支持文本到图像（T2I）和问答（Q&A）任务。由于这些数据库不透明且来自网络抓取，版权持有者需要方法来审计特定图像是否出现在其中。虽然先前的工作采用成员推理攻击（MIA）来审计单模态、基于文本的RAG，但由于两个关键挑战，它们无法迁移到IRAG。首先，跨模态检索：文本RAG MIA通过将目标段落的内容注入查询来强制检索该段落，这在IRAG中不可行，因为图像无法嵌入到文本查询中；即使准确的图像描述也无法弥合模态差距。其次，判别信号提取：文本RAG MIA通过提示生成器回答关于目标段落的多个问题来提取成员信号，而IRAG中的T2I生成器生成图像而非遵循Q&A指令。为填补这一空白，我们引入了首个针对IRAG的MIA——ImageAuditor，它将每个攻击查询分解为检索段和提取段，从而能够针对每个挑战进行专门优化。对于检索，我们提出奖励引导策略优化（RGPO），它根据奖励排序的候选更新随机策略，以导航跨模态嵌入空间，并具有有限样本最优性保证以平衡探索与利用。对于提取，我们分析MIA得分的分布以指导提示策略和评分规则的协同设计，并为T2I和Q&A任务导出特定于任务的实例化。我们通过K-means聚类聚合跨查询的信号，以实现可靠的成员决策。在各种IRAG系统上，ImageAuditor在每次审计图像仅需四次查询时AUROC超过80%，并在不同设置下保持鲁棒性。

英文摘要

Image-based Retrieval-Augmented Generation (IRAG) conditions a frozen generator on reference images retrieved from an external database, supporting both text-to-image (T2I) and question answering (Q&A) tasks. Because these databases are opaque and web-scraped, copyright holders need ways to audit whether specific images appear in them. While prior work employs membership inference attacks (MIAs) to audit uni-modal, text-based RAG, they fail to transfer to IRAG due to two key challenges. First, cross-modal retrieval: text-RAG MIAs force retrieval of the target passage by injecting its content into the query, which is unavailable in IRAG since images cannot be embedded into text queries; even accurate image captions fail to bridge the modality gap. Second, discriminative signal extraction: text-RAG MIAs extract membership signals by prompting the generator to answer multiple questions over the target passage, whereas T2I generators in IRAG produce images rather than follow Q&A commands. To fill this gap, we introduce the first MIA tailored to IRAG, ImageAuditor, which decomposes each attack query into a retrieval segment and an extraction segment, enabling dedicated optimization for each challenge. For retrieval, we propose Reward-Guided Policy Optimization (RGPO), which updates a stochastic policy from reward-ranked candidates to navigate the cross-modal embedding landscape and admits finite-sample optimality guarantees to balance exploration and exploitation. For extraction, we analyze the distribution of the MIA score to guide the co-design of the prompting strategy and scoring rule, and derive task-specific instantiations for T2I and Q&A tasks. We aggregate signals across queries via K-means clustering for reliable membership decisions. Across various IRAG systems, ImageAuditor exceeds 80% AUROC with only four queries per audited image and remains robust across diverse settings.

URL PDF HTML ☆

赞 0 踩 0

2606.03352 2026-06-03 cs.NI

Rain: RDMA-assisted In-Network Scheduling for Microsecond-scale Workloads

Rain：面向微秒级工作负载的RDMA辅助网络内调度

Zhihuang Ma, Xingming Cui, Xiaoliang Chen, Zuqing Zhu

AI总结针对微秒级服务时间与严格尾延迟需求，提出基于可编程交换机的RDMA辅助网络内调度器Rain，通过双向排队机制、交换驱动RDMA写入和切片感知调度，实现比现有最优方案高1.75倍的吞吐量。

Comments 21 pages, 11 figures. Published in Proceedings of the ACM on Networking (PACMNET), CoNEXT2

详情

DOI: 10.1145/3808670
Journal ref: Proc. ACM Netw. 4, CoNEXT2, Article 22, June 2026, 21 pages

AI中文摘要

现代数据中心应用日益需要微秒级服务时间和严格的尾延迟要求，而现有的网络内任务调度器由于其固有限制难以实现。具体而言，基于软件的调度器难以平衡吞吐量和延迟，而基于交换机的设计要么缺乏全局协调，要么严重依赖数据包重循环，或者对大型任务的支持有限。鉴于现有技术的这些限制，我们在本文中提出了Rain，一种构建在可编程交换机之上的RDMA辅助网络内调度器，它维护集中式队列同时限制工作节点本地队列。Rain引入了一种双向交换机排队机制，直接在交换机中缓冲和匹配任务与工作节点发出的令牌，避免了工作节点侧的轮询，并近似了无全局聚合的加入最短队列的最优行为。交换驱动RDMA引擎通过单边WRITE多播预写入任意大的任务，仅在交换机上保留紧凑的元数据。切片感知调度进一步将决策定位到更同质的队列，减少了分散引起的队头阻塞。此外，我们的研究表明，现实系统可能偏离理论预测：较浅的工作节点队列并不总是改善尾延迟。利用这一见解，Rain结合了一种自适应调度策略，在运行时优化工作节点队列深度和工作节点到切片的映射。使用真实应用RocksDB的评估表明，Rain在满足相同尾延迟要求的同时，吞吐量比性能最佳的现有技术高1.75倍。

英文摘要

Modern data center applications increasingly require microsecond-scale service time with strict tail latency requirements, which can hardly be realized with existing in-network task schedulers due to their inherent limitations. Specifically, software-based schedulers struggle to balance throughput and latency, while switch-based designs either lack global coordination, rely on packet recirculation heavily, or only offer limited support for large tasks. In light of these restrictions of the state-of-the-arts (SOTAs), we, in this work, propose Rain, an RDMA-assisted in-network scheduler built atop programmable switches that maintains centralized queues while bounding worker-local queues. Rain introduces a bidirectional on-switch queuing mechanism to buffer and match tasks and worker-issued tokens directly in the switch, avoiding worker-side polling and approximating the optimal behavior of join-bounded-shortest-queue without global aggregation. A switch-driven RDMA engine pre-writes arbitrarily large tasks via one-sided WRITE multicasts, keeping only compact metadata on the switch. Slice-aware scheduling further localizes decisions to more homogeneous queues, reducing dispersion-induced head-of-line blocking. Moreover, our study reveals that real-world systems can diverge from theoretical predictions: shallower worker queues do not always improve tail latency. Leveraging this insight, Rain incorporates an adaptive scheduling strategy to optimize worker queue depths and worker-to-slice mappings at runtime. Evaluations with the real-world application RocksDB show that Rain achieves 1.75x higher throughput than the best-performing SOTA while satisfying the same tail latency requirement.

URL PDF HTML ☆

赞 0 踩 0

2606.03351 2026-06-03 cs.DM math.CO

Reflective Numeration Systems I: a Global Standpoint

反射计数系统 I：全局视角

Benoît Rittaud

AI总结提出一个框架，通过引入Z-Gray积和理论工具，将标准b进制格雷码推广到k-bonacci格雷码及其他类型，并保持幂结合性和翻转数字性质。

2606.03350 2026-06-03 eess.SY cs.SY

Navigating the unknown in large-scale operational transformation programs: The "Sirius Days" framework as a 'pilot-organization' for characterizing emerging issues

大规模运营转型项目中未知领域的导航：“天狼星日”框架作为表征新兴问题的“试点组织”

Françoise Zink, Chipten Valibhay, Jose Bonet Faus

AI总结本文通过纵向案例研究，提出“天狼星日”框架作为一种“试点组织”，通过解构假设、提出猜想并实地测试，在认知、社会和规范三个维度上生成五种韧性杠杆，有效应对大型数字化转型项目中的未知挑战。

详情

AI中文摘要

大规模数字化转型项目必须同时维持现有运营并应对IT-业务-运营交互中涌现的深度未知——这是传统项目治理框架难以充分应对的挑战。基于对一个转型项目的纵向案例研究，我们调查了“天狼星日”，即每月一次的高级管理层务虚会，该会议被识别为关键成功因素。我们表明，该框架构成一个试点组织：一种组织性“装置”（或设备），它解构已有知识或假设，制定严谨的猜想，并在实际条件下进行测试。它产生了五种韧性杠杆——未知的系统性表征、早期异常识别、绩效规范的扩展、通过探究社区创造社会资本，以及跨规模的组织敏捷性扩展——揭示了一种组织性“装置”的模型，该模型在认知、社会和规范维度上实现了对未知的导航。

英文摘要

Large-scale digital transformation programs must simultaneously sustain existing operations and navigate deep unknowns emerging from IT-business-operations interactions -a challenge conventional project governance frameworks inadequately address. Based on a longitudinal case study of a transformation program, we investigate the ''Sirius Days,'' a monthly senior management retreat identified as a critical success factor. We show that this framework constitutes a pilot-organization: an organizational 'dispositif' (or apparatus) that deconstructs established knowledge or assumptions, formulates rigorous conjectures, and tests them in real conditions. It generated five resilience levers -systemic characterization of unknowns, early anomaly discernment, expansion of performance norms, social capital creation through a community of inquiry, and expansion of organizational agility across scales -revealing a model of an organizational 'dispositif' that operationalizes navigating unknowns across cognitive, social, and normative dimensions.

URL PDF HTML ☆

赞 0 踩 0

2606.03349 2026-06-03 cs.SE

AlgoTouch: An Execution-Centered Approach to Incremental Construction of Imperative Programs

AlgoTouch: 一种以执行为中心的命令式程序增量构建方法

Michel Adam, Patrice Frison, Sabine Letellier Zarshenas, Moncef Daoud

AI总结提出AlgoTouch系统，通过直接操作程序数据并记录执行行为，增量构建命令式程序，自动合成控制结构并生成多种主流语言代码。

详情

AI中文摘要

命令式语言的程序构建仍然主要基于编写文本代码，这些代码指定操作程序数据的指令序列。这种方法要求开发者预知指令对演化数据状态的影响，增加了认知负荷以及早期和增量开发中出错的可能性。本文提出了AlgoTouch，一种基于执行的系统，通过直接操作程序数据来增量构建命令式程序。程序不是通过组装语法结构，而是通过执行具体的数据转换来构建，这些转换被记录并整合到内部中间表示中。AlgoTouch依赖于一个显式的概念机，它暴露数据存储、计算和控制流，使得观察到的执行与程序结构之间能够持续对齐。该系统的一个核心贡献在于从执行行为中确定性地合成控制结构。条件语句从观察到的比较中推导出来，而迭代行为则封装在循环宏中，支持非线性和增量构建。这种设计允许部分和不完整的程序被执行、细化和完成，同时保持语义一致性。AlgoTouch自动生成多种主流命令式语言（包括Python、C、C++和Java）的正确且可读的程序。该系统通过在一组代表性算法基准上进行工程级验证来评估，证明了正确性、表达力、鲁棒性和语言无关性。通过将执行、构建和代码生成集成在一个统一架构中，这项工作为交互式程序构建引入了一种替代模型，并贡献了一类新的以执行为中心的开发系统。

英文摘要

Program construction in imperative languages remains largely based on writing textual code that specifies sequences of instructions operating on program data. This approach requires developers to anticipate the effects of instructions on evolving data states, which increases cognitive load and the likelihood of errors during early and incremental development. This paper presents AlgoTouch, an execution-based system for incremental construction of imperative programs through direct manipulation of program data. Rather than assembling syntactic structures, programs are constructed by executing concrete data transformations that are recorded and incorporated into an internal intermediate representation. AlgoTouch relies on an explicit notional machine that exposes data storage, computation, and control flow, enabling continuous alignment between observed execution and program structure. A central contribution of the system lies in its deterministic synthesis of control structures from execution behavior. Conditional statements are derived from observed comparisons, while iterative behaviors are encapsulated in loop macros that support non-linear and incremental construction. This design enables partial and incomplete programs to be executed, refined, and completed while preserving semantic consistency. AlgoTouch automatically generates correct and readable programs in several mainstream imperative languages, including Python, C, C++, and Java. The system is evaluated through engineering-level validation on a representative set of algorithmic benchmarks, demonstrating correctness, expressiveness, robustness, and language independence. By integrating execution, construction, and code generation within a unified architecture, this work introduces an alternative model for interactive program construction and contributes a new class of execution-centered development systems.

URL PDF HTML ☆

赞 0 踩 0

2606.03289 2026-06-03 cs.CR

Privilege Risk Evolution for Non-Human Identities: A Temporal Fiber Model for Cloud IAM

非人类身份的特权风险演化：云IAM的时间纤维模型

Christophe Parisel

AI总结针对云IAM中非人类身份的特权风险演化问题，提出基于图纤维化与强连通分量的时间纤维模型，通过空间商、谱系分区和窗口SCC分析三层框架，在Azure租户上验证了棘轮型特权电路可预测长期结构稳定性。

2606.03282 2026-06-03 cs.CE

GROSS: German Rail Open-Source SUMO Scenario

GROSS：德国铁路开源SUMO场景

Juri Penell, Damian Dailisan

AI总结提出GROSS开源管道，结合OpenStreetMap铁路基础设施与GTFS时刻表，通过拓扑感知的站点映射和分层路由生成全国规模SUMO铁路场景，显著减少传送伪影并支持延迟传播分析。

详情

AI中文摘要

微观仿真能够实现智能交通系统中的可重复评估，然而大多数开放的SUMO场景和工具链仍以道路交通为中心，铁路因其对公共交通的重要性以及对网络范围中断的敏感性而代表性不足。我们提出了德国铁路开源场景（GROSS），这是一个开放管道，结合OpenStreetMap铁路基础设施与GTFS时刻表，为SUMO（城市移动性仿真）生成全国规模的铁路场景。现有的转换通常依赖于仅基于几何的站点到轨道匹配以及不一致的站台/轨道分配，这可能导致路由异常和以传送伪影为主的不可靠仿真。GROSS通过分层车站模型实现拓扑感知的站点映射，随后进行车站级路由，并带有验证和针对性修复。在多个德国地区，GROSS将每辆车的平均传送次数减少了1.7–76.8倍，与原始SUMO管道相比缩短了延误，并能够端到端生成覆盖全德国的场景，包含35,925次行程，用于与运营商报告的延误统计进行比较。虽然剩余的长延误突显了可用时刻表元数据和铁路调度建模的局限性，但GROSS降低了构建可扩展、完全开放的铁路仿真以及研究国家尺度延误传播的门槛。

英文摘要

Microscopic simulation enables reproducible evaluation in intelligent transportation systems, yet most open SUMO scenarios and toolchains remain road-traffic centric, leaving rail underrepresented despite its importance for public transport and its sensitivity to network-wide disruptions. We present the German Rail Open-Source Scenario (GROSS), an open pipeline that combines OpenStreetMap railway infrastructure with GTFS schedules to generate nation-scale rail scenarios for SUMO (Simulation of Urban MObility). Existing conversions often rely on geometry-only stop-to-track matching and inconsistent platform/track assignments, which can create routing anomalies and unstable simulations dominated by teleportation artefacts. GROSS addresses this with topology-aware stop mapping via a hierarchical station model, followed by station-level routing with validation and targeted repair. Across multiple German regions, GROSS reduces average teleportations per vehicle by a factor of 1.7--76.8$\times$, shortens delays compared to the vanilla SUMO pipeline, and it enables end-to-end generation of a Germany-wide scenario with 35\,925 trips for comparisons with operator-reported delay statistics. While the remaining long delays highlight limitations in available timetable metadata and rail dispatch modeling, GROSS lowers the barrier to building scalable, fully open rail simulations and to studying delay propagation at country scale.

URL PDF HTML ☆

赞 0 踩 0

2606.03271 2026-06-03 cs.HC

Agentic Relationship Harm: Benchmarking and Gating Relational Manipulation in AI Agents

代理关系伤害：AI代理中关系操纵的基准测试与门控

Pei-Sze Tan, Tasuku Igarashi, Isao Echizen

AI总结针对AI代理可能造成的关系操纵风险，提出了一个110提示的基准测试、关系特定标注框架和轻量级后生成策略门控，以评估和缓解代理关系伤害。

Comments 13 pages, 3 figures

详情

AI中文摘要

基于大语言模型的AI代理不仅可以协助合法任务，还可能用于关系操纵。AI代理可被用来帮助用户维持欺骗性身份、加剧情感依赖、孤立目标或为后续剥削做准备。我们将这种风险概念化为代理关系伤害：利用接收者脆弱性、说服性影响和关系权力不对称的工作流级协助。现有的安全评估和通用防护措施通常将伤害视为孤立输出的属性，忽略了角色敏感的交互模式。为研究这一点，我们引入了一个110提示的基准测试，包含平衡的攻击方和受害方案例、一个关系特定的标注框架，以及一个用于本地代理部署的轻量级后生成策略门控。在我们的评估中，关系特定门控在自动评判下优于通用安全提示，在主基准测试和多轮压力测试中未发现评判员识别的有害合规案例，同时保留了受害方的保护性干预。这些结果表明，关系伤害是一个独特的社会技术风险面，角色敏感评估加上轻量级策略门控提供了一条超越通用拒绝提示的实用路径。

英文摘要

AI agents built on large language models can assist not only legitimate tasks but also relational manipulation. AI agents can be used to help a user maintain a deceptive identity, intensify emotional dependency, isolate a target, or prepare for later extraction. We conceptualise this risk as agentic relationship harm: workflow-level assistance that can exploit recipient vulnerability, persuasive influence, and relational power asymmetry. Existing safety evaluations and generic guardrails often treat harmfulness as a property of isolated outputs, missing role-sensitive interaction patterns. To study this, we introduce a 110-prompt benchmark with balanced attacker- and victim-side cases, a relationship-specific labelling framework, and a lightweight post-generation policy gate for local agent deployments. In our evaluation, the relationship-specific gate outperforms generic safety prompting under automated judging, with no judge-identified harmful-compliance cases on the main benchmark or multi-turn stress test while preserving victim-side protective intervention. These results suggest that relationship harm is a distinct sociotechnical risk surface and that role-sensitive evaluation plus lightweight policy gating offers a practical path beyond generic refusal prompting.

URL PDF HTML ☆

赞 0 踩 0

2606.03266 2026-06-03 cs.HC

ReforMe: Re-Shaping Documents with Contextual Prompting and Layout-Aware Propagation

ReforMe: 通过上下文提示和布局感知传播重塑文档

Nabin Khanal, Tongyan Wang, Jui-Cheng Chiu, Ningning Nicole Kong, Hannah Yanhua Zong, Yingjie Victor Chen

AI总结提出一个交互式文档数字化系统，结合布局感知解析、OCR和基于LLM的重建，通过用户驱动精炼和布局感知传播机制，提高复杂文档的校正效率和结构化表示能力。

详情

AI中文摘要

数字化包含手写内容、不规则表格和异构布局的复杂文档仍然具有挑战性，因为传统光学字符识别（OCR）系统无法捕捉书写细节、作者特定惯例和文档结构，而近期基于LLM的方法缺乏精确、可扩展的校正机制。我们提出了一个交互式文档数字化系统，该系统集成了布局感知解析、OCR和基于LLM的重建，并支持用户驱动的精炼。该系统基于一项形成性研究，该研究识别了实际数字化工作流程中的关键挑战和交互需求。它支持直接编辑和自然语言指令，并引入了一种布局感知传播机制，该机制将用户校正推广到结构相似的区域。这不仅实现了高效的错误校正，还能将文档重塑为结构化、可分析的表征。我们通过一项针对真实文档的受试者内用户研究（n=12）评估了该系统。结果显示校正效率提高，重复性工作减少，证明了更有效且可控的文档数字化流程。

英文摘要

Digitizing complex documents with handwritten content, irregular tables, and heterogeneous layouts remains challenging, as traditional Optical Character Recognition (OCR) systems fail to capture writing nuances, author-specific conventions, and document structure, and recent LLM-based approaches lack mechanisms for precise, scalable correction. We present an interactive document digitization system that integrates layout-aware parsing, OCR, and LLM-based reconstruction with user-driven refinement. The system is informed by a formative study that identifies key challenges and interaction needs in real-world digitization workflows. It supports both direct edits and natural-language instructions, and introduces a layout-aware propagation mechanism that generalizes user corrections across structurally similar regions. This enables not only efficient error correction but also document re-shaping into structured, analyzable representations. We evaluate the system through a within-subjects user study (n=12) on real-world documents. Results show improved correction efficiency and reduced repetitive effort, demonstrating more effective and controllable document digitization procedure.

URL PDF HTML ☆

赞 0 踩 0

2606.03255 2026-06-03 cs.CE

Multi-Agent Framework Leveraging Knowledge Graphs for Virtual Commissioning Models

基于知识图谱的多智能体框架用于虚拟调试模型

Max Diekmann, Jonas Nitzler, Jan Fischer, Hans-Jürgen Pfisterer, Dirk Hartmann

AI总结提出一种知识图谱驱动的多智能体框架，通过从西门子TIA Portal和NX MCD提取结构化数据并转化为图表示，半自动化实现虚拟调试模型的系统理解、仿真组件生成和跨域信号映射。

详情

AI中文摘要

离散制造系统的虚拟调试模型（VCM）用于在物理部署前验证自动化行为，但其创建和维护仍然劳动密集。相关的工程信息分布在可编程逻辑控制器（PLC）工程项目（如西门子TIA Portal）和运动学仿真模型（如西门子NX机电概念设计器（NX MCD））中，这些信息存储在不相容、工具特定的数据结构中。实践中，基于IEC 61131-3的PLC程序和变量与刚体和运动学仿真对象（如零件、关节、传感器和执行器）分别进行工程化。因此，理解系统行为、生成仿真组件以及将PLC变量映射到相应的仿真对象需要跨领域专业知识，并且很大程度上是手动的。本文提出了一种基于知识图谱的多智能体框架，用于半自动化的VCM开发。一个确定性的设置过程从西门子TIA Portal和西门子NX MCD中提取结构化数据，并将两种来源转换为共享图数据库中的基于图的表示。该框架使用分层多智能体架构来支持早期VCM开发中的三类任务：系统理解、仿真组件生成和跨域信号映射。它提供了基于自然语言访问工程知识的能力、模板引导的NX Open可执行日志脚本生成，以及PLC变量与NX MCD仿真对象之间的排序映射建议。在一个实验室规模的离散制造系统上的评估表明，该方法减少了手动跨域解释的工作量，并使重复性的VCM工程任务更加可行。

英文摘要

Virtual commissioning models (VCMs) of discrete manufacturing systems are used to validate automation behavior before physical deployment, but creating and maintaining them remains labor-intensive. Relevant engineering information is distributed across programmable logic controller (PLC) engineering projects, such as Siemens TIA Portal, and kinematic simulation models, such as Siemens NX Mechatronics Concept Designer (NX MCD), where it is stored in incompatible, tool-specific data structures. In practice, IEC 61131-3-based PLC programs and variables are engineered separately from rigid-body and kinematic simulation objects such as parts, joints, sensors, and actuators. As a result, understanding system behavior, generating simulation components, and mapping PLC variables to corresponding simulation objects require cross-domain expertise and remain largely manual. This paper presents a knowledge-graph-grounded multi-agent framework for semi-automated VCM development. A deterministic setup process extracts structured data from Siemens TIA Portal and Siemens NX MCD and transforms both sources into graph-based representations within a shared graph database. The framework uses a hierarchical multi-agent architecture to support three task classes in early-stage VCM development: system understanding, simulation component generation, and cross-domain signal mapping. It provides grounded natural-language access to engineering knowledge, template-guided generation of executable NX Open journal scripts, and ranked mapping suggestions between PLC variables and NX MCD simulation objects. Evaluation on a laboratory-scale discrete manufacturing system shows that the approach reduces manual cross-domain interpretation effort and makes recurring VCM engineering tasks more actionable.

URL PDF HTML ☆

赞 0 踩 0