arXivDaily arXiv每日学术速递 周一至周五更新
重置
CS计算机1059
2606.12144 2026-06-11 cs.SC cs.CC 新提交

Output-sensitive Sparse Polynomial GCD over Finite Fields is NP-hard

输出敏感的稀疏多项式最大公因子在有限域上是NP难的

Ruichen Qiu, Yichuan Cao, Qiao-Long Huang, Ruyong Feng, Xiao-Shan Gao

AI总结 证明在有限域上计算两个稀疏单变元多项式的最大公因子(输出敏感)是NP难的,除非NP⊆BPP。

详情
AI中文摘要

在本文中,我们证明在有限域上输出敏感的稀疏多项式最大公因子计算在BPP多一归约下是NP难的。更精确地说,对于两个系数在有限域上的稀疏单变元多项式$f,g$,在标准复杂度假设$\mathrm{NP}\nsubseteq\mathrm{BPP}$下,不存在随机算法能够在$f,g,\gcd(f,g)$的大小的多项式时间内计算$\mathrm{gcd}(f,g)$。这解决了有限域背景下Sparsity Challenges中挑战5提出的开放问题。此外,我们证明有限域上的单位根检测问题是NP难的;即,确定一个稀疏单变元多项式与$x^n - 1$的最大公因子是否有非零度是NP难的。

英文摘要

In this paper, we prove that output-sensitive sparse polynomial GCD computation over finite fields is NP-hard under BPP many-one reduction. More precisely, for two sparse univariate polynomials $f,g$ with finite field coefficients, there exists no randomized algorithm to compute $\mathrm{gcd}(f,g)$, which is polynomial-time in the sizes of $f,g,\gcd(f,g)$ under the standard complexity assumption $\mathrm{NP}\nsubseteq\mathrm{BPP}$. This settles the open problem posed as Challenge 5 in The Sparsity Challenges in the finite field setting. Furthermore, we show that the Roots of Unity Detection problem over finite fields is NP-hard; that is, determining whether the GCD of a sparse univariate polynomial and $x^n - 1$ has nonzero degree is NP-hard.

2606.12142 2026-06-11 cs.RO cs.CV 新提交

AerialClaw: An Open-Source Framework for LLM-Driven Autonomous Aerial Agents

AerialClaw:一个用于LLM驱动的自主空中智能体的开源框架

Ke Li, Jianfei Yang, Luyao Zhang, Guo Yu, Chengwei Yan, Yuan Ding, Di Wang, Nan Luo, Gang Liu, Xiao Gao, Quan Wang

发表机构 * Xidian University(西安电子科技大学) Xi'an University of Architecture and Technology(西安建筑科技大学)

AI总结 提出AerialClaw开源框架,采用模块化脑-技能-运行时架构,使基于LLM的智能体能够理解自然语言任务、调用空中技能、闭环决策,提升无人机系统的灵活性、可复现性和可扩展性。

详情
AI中文摘要

无人机(UAV)越来越多地用于检查、搜索救援、环境监测和应急响应。然而,大多数无人机应用仍然依赖于预定义的命令序列或特定任务的管道,开发者手动连接感知、规划、飞行控制、仿真、日志记录和安全模块。这限制了自主空中系统的灵活性、可复现性和可扩展性。本文提出了AerialClaw,一个开源软件框架,使无人机能够作为决策型空中智能体运行,而不仅仅是遵循命令的平台。给定自然语言任务,AerialClaw允许基于LLM的智能体理解任务、维护上下文、调用可执行的空中技能、观察感知和运行时反馈,并在闭环中迭代更新其决策。该框架采用模块化的脑-技能-运行时架构,结合了用于原子无人机操作的硬技能、基于Markdown的可重用任务策略软技能、文档驱动的智能体状态和能力边界、记忆驱动的反思、面向安全的运行时验证以及平台无关的执行适配器。AerialClaw支持轻量级模拟执行、PX4 SITL与Gazebo以及基于AirSim的仿真,同时提供Web控制台、可插拔模型后端、示例任务、仿真资产和分阶段部署脚本。通过结合标准化的空中技能、文档驱动的智能体状态、记忆和闭环LLM决策,AerialClaw提供了一个可复现且可扩展的开源框架,用于构建能够解释任务、做出决策、执行技能并根据反馈调整行为的无人机系统。

英文摘要

Unmanned aerial vehicles (UAVs) are increasingly used in inspection, search and rescue, environmental monitoring, and emergency response. However, most UAV applications still rely on pre-defined command sequences or task-specific pipelines, where developers manually connect perception, planning, flight control, simulation, logging, and safety modules. This limits the flexibility, reproducibility, and extensibility of autonomous aerial systems. This paper presents AerialClaw, an open-source software framework that enables UAVs to operate as decision-making aerial agents rather than merely command-following platforms. Given a natural-language mission, AerialClaw allows an LLM-based agent to understand the task, maintain context, invoke executable aerial skills, observe perception and runtime feedback, and iteratively update its decisions in a closed loop. The framework adopts a modular brain-skill-runtime architecture, combining hard skills for atomic UAV operations, Markdown-based soft skills for reusable task strategies, document-driven agent state and capability boundaries, memory-driven reflection, safety-oriented runtime validation, and platform-agnostic execution adapters. AerialClaw supports lightweight mock execution, PX4 SITL with Gazebo, and AirSim-based simulation, together with a web console, pluggable model backends, example missions, simulation assets, and staged deployment scripts. By combining standardized aerial skills, document-driven agent state, memory, and closed-loop LLM decision-making, AerialClaw provides a reproducible and extensible open-source framework for building UAV systems that can interpret missions, make decisions, execute skills, and adapt their behavior from feedback.

2606.12141 2026-06-11 cs.LG 新提交

PCA-Enhanced Adaptive NVAR Framework for High-Resolution Sea Surface Temperature Forecasting in the East Sea

PCA增强的自适应NVAR框架用于东海高分辨率海面温度预测

Sherkhon Azimov, Susana López-Moreno, Eric Dolores-Cuenca, JinYong Choi, Sangil Kim

发表机构 * Pusan National University(釜山大学)

AI总结 提出PCA增强的自适应NVAR框架,通过SVD降维和自适应NVAR时序建模,实现东海海面温度的高效准确预测,优于标准NVAR方法。

详情
Comments
14 pages, 7 figures
AI中文摘要

准确预测东海等区域海的海面温度(SST)对于监测海洋生态系统、评估气候风险、管理渔业和执行海军行动至关重要。传统的数值海洋模型提供可靠的预测,但计算成本高,通常不适合实时预测。许多深度学习方法也难以处理高维时空海洋数据,并在较长的预测周期内出现误差累积。本研究基于我们先前提出的自适应下一代储层计算(Adaptive NVAR)框架,该框架最初在合成动力系统上引入和测试,并将其扩展到海洋预测。我们提出了一种降阶预测框架,将奇异值分解(SVD)与自适应NVAR相结合,以预测东海的SST动态。使用SVD将SST场压缩为低维表示,提取海洋变率的主导模态。自适应NVAR对这些潜在状态的时间演化进行建模,并将预测状态重建为SST预测。我们使用区域海洋数据集评估该框架,并将其与标准NG-RC/NVAR进行比较。结果表明,自适应NVAR在多个预测时域上始终实现较低的预测误差。此外,SVD降低了计算复杂度,从而产生了一个适用于实时海洋预测的快速且可扩展的框架。

英文摘要

Accurate forecasting of sea surface temperature (SST) in regional seas such as the East Sea is crucial for monitoring marine ecosystems, assessing climate risks, managing fisheries, and conducting naval operations. Traditional numerical ocean models provide reliable predictions but are computationally expensive and often unsuitable for real-time forecasting. Many deep learning methods also struggle with high-dimensional spatiotemporal ocean data and experience error accumulation over longer forecasting periods. This study builds on our previously proposed Adaptive Next-Generation Reservoir Computing (Adaptive NVAR) framework, initially introduced and tested on synthetic dynamical systems, and extends it to ocean forecasting. We present a reduced-order forecasting framework that combines Singular Value Decomposition (SVD) with Adaptive NVAR to predict SST dynamics in the East Sea. SST fields are compressed into a low-dimensional representation using SVD, which extracts dominant modes of ocean variability. Adaptive NVAR models the temporal evolution of these latent states, and the predicted states are reconstructed into SST forecasts. We evaluate the framework using regional ocean datasets and compare it with the standard NG-RC/NVAR. Results show that Adaptive NVAR consistently achieves lower forecasting errors across multiple prediction horizons. In addition, SVD reduces computational complexity, resulting in a fast and scalable framework suitable for real-time ocean forecasting.

2606.12140 2026-06-11 cs.CV 新提交

Time-Conditioned and Multi-Time Survival Prediction from 2D PET/CT Projections in Lung Cancer

基于2D PET/CT投影的时间条件与多时间生存预测在肺癌中的应用

Ashish Chauhan, Sambit Tarai, Elin Lundström, Johan Öfverstedt, Håkan Ahlström, Joel Kullberg

发表机构 * Radiology, Department of Surgical Sciences, Uppsala University(乌普萨拉大学外科学系放射科) National Academic Infrastructure for Supercomputing (NAISS), Linköping University(林雪平大学国家学术超级计算基础设施) Antaros Medical SciLifeLab, Uppsala University(乌普萨拉大学SciLifeLab)

AI总结 提出时间条件生存(ATCS)和多时间生存(MTS)两种方法,利用2D PET/CT投影预测非小细胞肺癌患者生存,ATCS在早期预测更优,MTS在晚期更优。

详情
Comments
Under review at MIUA 2026
AI中文摘要

从正电子发射断层扫描/计算机断层扫描(PET/CT)准确预测总生存期(OS)可以支持肿瘤学中的个性化治疗和随访策略。然而,时间建模对基于影像的生存预测的影响仍未得到充分探索。我们通过开发两种互补方法:注意力引导的时间条件生存(ATCS)和多时间生存(MTS),研究了不同时间公式如何影响生存预测。我们回顾性分析了848例非小细胞肺癌(NSCLC)患者的治疗前PET/CT图像,其中556例用于模型开发,292例用于保留测试。使用先前提出的时间条件生存(TCS)模型作为基线。模型通过5折交叉验证训练,并在测试集上使用时间依赖性曲线下面积(AUC)在0.5至5年之间每6个月间隔进行评估。ATCS和MTS均优于基线TCS模型,平均AUC分别为0.794和0.793,而基线为0.767。ATCS在早期时间点(0.5-3年)表现更好,而MTS在后期间隔(3.5-5年)表现更好。结合肿瘤特异性和组织特异性PET/CT特征比单独使用任一输入提高了性能。更精细的时间离散化改善了短期预测,而更粗的间隔提供了更稳定的长期估计。这些发现表明时间建模和输入设计影响基于PET/CT的生存预测。所提出的方法能够从治疗前影像进行时间特异性生存估计,并可能支持改进的风险分层和临床决策。

英文摘要

Accurate prediction of overall survival (OS) from positron emission tomography/computed tomography (PET/CT) can support personalized treatment and follow-up strategies in oncology. However, the impact of temporal modeling on imaging-based survival prediction remains insufficiently explored. We investigate how different temporal formulations influence survival prediction by developing two complementary approaches: Attention-guided Time-Conditioned Survival (ATCS) and Multi-Time Survival (MTS). We retrospectively analyzed pre-treatment PET/CT images from 848 patients with non-small cell lung cancer (NSCLC), including 556 for model development and 292 for held-out testing. A previously proposed Time-Conditioned Survival (TCS) model was used as a baseline. Models were trained using 5-fold cross-validation and evaluated on the test set using time-dependent area under the curve (AUC) at 6-month intervals from 0.5 to 5 years. Both ATCS and MTS outperformed the baseline TCS model, achieving mean AUCs of 0.794 and 0.793, respectively, compared to 0.767. ATCS performed better at earlier time points (0.5-3 years), whereas MTS performed better at later intervals (3.5-5 years). Combining tumor-specific and tissue-wise PET/CT features improved performance over either input alone. Finer temporal discretization improved short-term prediction, while coarser intervals provided more stable long-term estimates. These findings demonstrate that temporal modeling and input design influence PET/CT-based survival prediction. The proposed approaches enable time-specific survival estimation from pre-treatment imaging and may support improved risk stratification and clinical decision-making.

2606.12139 2026-06-11 cs.IT 新提交

Reconfigurable Antennas for Next-generation Mobile Communication Networks: A Comprehensive Survey and Tutorial

面向下一代移动通信网络的可重构天线:综合调查与教程

Yizhe Zhao, Long Zhang, Halvin Yang, Kun Yang, Rui Zhang, Lingyang Song, Yuanwei Liu

AI总结 本文综述了可重构天线(包括流体天线、可移动天线、夹捏天线和可重构全息天线)在6G网络中的信道建模、性能分析、资源分配及与其他技术的协同,并比较了不同天线类型,指出了未来研究方向。

详情
Comments
A Comprehensive Survey on Fluid Antennas, Movable Antennas, Pinching Antennas, and Holographic Antennas
AI中文摘要

向下一代移动通信网络(特别是6G)的过渡需要先进技术来满足超可靠低延迟通信、大规模连接和智能应用的需求。可重构天线(RAs)通过动态调整天线的射频特性(如增益、辐射方向图、阻抗和极化)在实现这些目标中发挥着关键作用。与传统的固定位置天线不同,RAs可以改变其辐射模式和位置,从而灵活应对变化的通信环境。本文对RAs进行了全面的调查和教程,重点关注流体天线(FAs)、可移动天线(MAs)、夹捏天线(PAs)和可重构全息天线(RHAs),探讨了它们在下一代移动网络中的潜力。我们研究了每种RA的信道建模与估计、性能分析、资源分配策略以及它们与其他新兴无线技术的协同作用。最后,我们提供了不同RAs的比较分析,并讨论了开放的挑战和未来研究方向,为这一激动人心的研究领域的未来探索提供了见解和指导。

英文摘要

The transition to next-generation mobile communication networks, particularly 6G, demands advanced technologies to meet the requirements for ultra-reliable, low-latency communication, massive connectivity, and intelligent applications. Reconfigurable antennas (RAs) play a crucial role in achieving these objectives by enabling dynamic adjustments to the radio frequency (RF) characteristics of antennas, such as gain, radiation pattern, impedance, and polarization. Unlike traditional fixed-position antennas, RAs can alter both their radiation patterns and positions, offering flexibility in response to varying communication environments. This paper presents a comprehensive survey and tutorial on RAs, with a focus on fluid antennas (FAs), movable antennas (MAs), pinching antennas (PAs), and reconfigurable holographic antennas (RHAs), examining their potential in next-generation mobile networks. We explore the channel modelling and estimation, performance analysis, resource allocation strategies, and their synergy with other emerging wireless technologies for each type of RA. Finally, we provide a comparative analysis of different RAs and discuss the open challenges and future research directions, offering insights and guidance for future investigations in the exciting research area.

2606.12138 2026-06-11 cs.LG cs.AI cs.CL 新提交

Unstable Features, Reproducible Subspaces: Understanding Seed Dependence in Sparse Autoencoders

不稳定特征,可复现子空间:理解稀疏自编码器中的种子依赖性

Gleb Gerasimov, Timofei Rusalev, Nikita Balagansky, Daniil Laptev, Vadim Kurochkin, Daniil Gavrilov

发表机构 * T-Tech

AI总结 研究稀疏自编码器特征的可复现性,发现稳定特征承载主要信号,不稳定特征集中于可复现的低秩子空间,反映基歧义而非纯噪声。

详情
AI中文摘要

稀疏自编码器(SAE)被广泛用于解释神经网络表示,但其效用取决于学习到的特征是否在不同训练运行间可复现。我们通过\textit{特征稳定性}研究这一问题:对于每个SAE特征,我们估计其在独立训练的SAE中再次出现的概率。这产生了一个可扩展的每特征信号,将稳定特征与不稳定特征区分开来。在一项跨种子、模型、层、字典大小和SAE变体的大规模研究中,我们发现显著的功能不对称性:稳定特征承载了大部分重建和预测相关信号,而不稳定特征的边际影响较弱,并且在激活统计和自动解释中主要由低频表面形式触发主导。在几何上,不稳定特征个体不可复现,但集中在可复现的低秩子空间中,这表明种子依赖性通常反映了共享激活空间区域内的基歧义,而非纯噪声。一个受控的合成模型使这一机制明确,表明低秩真实特征可以在子空间级别被恢复,而作为个体SAE潜在变量跨种子仍不可识别。最后,通过汇集独特的跨种子特征,我们构建了更稳定的SAE,同时在此设置中保留了解释方差。这些结果共同表明,不稳定特征不仅仅是失败或噪声潜在变量:它们个体功能影响较弱,但反映了标准SAE跨种子不同解析的可复现低维结构。

英文摘要

Sparse autoencoders (SAEs) are widely used to interpret neural network representations, but their utility depends on whether the learned features are reproducible across training runs. We study this question through \emph{feature stability}: for each SAE feature, we estimate the probability that a similar feature reappears in an independently trained SAE. This yields a scalable per-feature signal that separates stable from unstable features. In a large-scale study across seeds, models, layers, dictionary sizes, and SAE variants, we find a pronounced functional asymmetry: stable features carry most of the reconstruction- and prediction-relevant signal, while unstable features have weak marginal impact and are dominated by low-frequency surface-form triggers in both activation statistics and automatic explanations. Geometrically, unstable features are individually non-reproducible but concentrate in reproducible lower-rank subspaces, suggesting that seed dependence often reflects basis ambiguity within a shared region of activation space rather than pure noise. A controlled synthetic model makes this mechanism explicit, showing that low-rank ground-truth features can be recovered at the subspace level while remaining non-identifiable as individual SAE latents across seeds. Finally, by pooling unique cross-seed features, we construct more stable SAEs while preserving explained variance in this setting. Together, these results show that unstable features are not merely failed or noisy latents: they have weak individual functional impact, but reflect reproducible low-dimensional structure that standard SAEs resolve differently across seeds.

2606.12136 2026-06-11 cs.NI 新提交

Greenness-Driven Scheduling in Far Edge Kubernetes: A CODECO Evaluation

远边缘Kubernetes中的绿色驱动调度:一项CODECO评估

Kaikang Huang, Dalal Ali, Rute C. Sofia

AI总结 本文研究Kubernetes CODECO框架如何通过跨层能量感知调度,在IoT-Edge-Cloud连续体中降低容器化应用能耗,实验表明在ARM设备上可节省高达11.01 mJ计算能耗和4.14 mJ网络能耗。

详情
AI中文摘要

能源消耗在IoT-Edge-Cloud基础设施中日益受到关注,其中容器化应用编排必须在性能与可持续性之间取得平衡。本文研究了Kubernetes CODECO框架如何将跨层能量感知集成到IoT-Edge-Cloud连续体中容器化应用的调度决策中。CODECO通过Kepler在计算层面以及网络(IP)层面监控能量,并使用这些指标定义绿色启发式规则,通过其基于ILP的调度器指导Pod放置决策。该方法在由基于ARM的嵌入式设备组成的真实远边缘测试平台上进行了实验评估,在多种场景下将CODECO与原生Kubernetes进行了比较。结果表明,CODECO持续降低了集群的能耗,在峰值负载下,对于结合了不同类型注入故障条件(包括CPU压力、非对称网络延迟和带宽争用)的广泛场景,计算能耗节省高达11.01 mJ,网络传输能耗节省高达4.14 mJ。结合两个能量维度的复合绿色评分在所有条件下提供了稳定且一致的调度策略排名,证明了其作为跨IoT-Edge-Cloud连续体集群级编排决策的统一能量指标的适用性。

英文摘要

Energy consumption is an increasing concern in IoT-Edge-Cloud infrastructures, where containerized application orchestration must balance performance with sustainability. This paper investigates how the Kubernetes CODECO framework integrates cross-layer energy-awareness into scheduling decisions for containerized applications across the IoT-Edge-Cloud continuum. CODECO monitors energy at both the computational level, via Kepler, and at a network (IP) level, and uses these metrics to define greenness heuristics that guide pod placement decisions through its ILP-based scheduler. The approach is experimentally evaluated on a real-world far Edge testbed composed of ARM-based embedded devices, comparing CODECO against vanilla Kubernetes across multiple scenarios. The results show that CODECO consistently reduces the energy consumption of the cluster, with savings of up to 11.01 mJ in computational energy and 4.14 mJ in network transmission energy consumption at peak load, for a wide set of scenarios which combine different types of injected fault conditions, including CPU stress, asymmetric network delay, and bandwidth contention. A composite greenness score combining both energy dimensions provides a stable and consistent ranking of scheduling strategies across all conditions, demonstrating its suitability as a unified energy indicator for cluster-level orchestration decisions across the IoT-Edge-Cloud continuum.

2606.12130 2026-06-11 cs.SC cs.CC 新提交

Sparse Polynomial Divisibility Test over Finite Field is CoNP-hard

有限域上稀疏多项式整除性测试是CoNP难的

Yichuan Cao, Ruichen Qiu, Qiao-Long Huang, Ruyong Feng, Xiao-Shan Gao

AI总结 本文证明在BPP多一归约下,判定稀疏多项式在有限域上是否不整除另一个稀疏多项式是NP难的,即稀疏多项式整除性测试是CoNP难的,解决了长期悬而未决的复杂度问题。

详情
AI中文摘要

在本文中,我们证明在BPP多一归约下,判定一个稀疏多项式是否不整除另一个稀疏多项式(在有限域上精确整除)是NP难的。等价地,有限域上的稀疏多项式整除性测试是CoNP难的。这解决了关于有限域上稀疏多项式整除性测试的计算复杂性的长期未决问题。

英文摘要

In this paper, we show that deciding whether a sparse polynomial does not divide another sparse polynomial exactly over finite fields is NP-hard under BPP many-one reductions. Equivalently, the sparse polynomial divisibility test over finite fields is CoNP-hard. This resolves the long-standing open problem concerning the computational complexity of the divisibility test for sparse polynomials in the setting of finite fields.

2606.12128 2026-06-11 cs.CE 新提交

From Agent Identity to Agent Economy: Measuring the Operational Readiness of ERC-8004 AI Agents

从代理身份到代理经济:衡量ERC-8004 AI代理的操作就绪度

Rischan Mafrur, Priagung Khusumanegara

AI总结 本文通过分析以太坊上ERC-8004代理的数据,构建操作就绪度框架,发现早期采用以注册为主但操作浅层,身份层可见但元数据、服务、声誉和跨链证据有限,所有权和反馈活动高度集中,表明从代理身份到代理经济的转型尚未完成。

详情
AI中文摘要

本文研究区块链注册的AI代理是否在身份注册之外表现出操作就绪度。利用以太坊上ERC-8004代理的数据集,我们构建了一个代理级特征表,涵盖身份状态、元数据、服务声明、声誉反馈、转移和跨链注册。我们基于可观察证据层开发了一个操作就绪度框架,并通过所有者-代理、反馈-客户端、钱包-转移以及组合证据关系的网络分析进行补充。结果表明,早期ERC-8004采用以注册为主但操作浅层。虽然身份层在大规模上可见,但元数据可用性、服务暴露、声誉形成和跨链证据仍然有限。所有权和反馈活动也高度集中,表明早期参与由少数高活动性钱包和客户端塑造。网络分析进一步表明,更丰富的操作证据集中在少数代理周围,而非广泛分布于整个生态系统。研究结果表明,ERC-8004为去中心化AI代理提供了重要的身份层,但从代理身份到代理经济的转型尚未完成。

英文摘要

This paper examines whether blockchain-registered AI agents demonstrate operational readiness beyond identity registration. Using a dataset of ERC-8004 agents on Ethereum, we construct an agent-level feature table covering identity status, metadata, service declarations, reputation feedback, transfers, and cross-chain registration. We develop an operational readiness framework based on observable evidence layers and complement it with network analysis of owner-agent, feedback-client, wallet-transfer, and combined evidence relationships. The results show that early ERC-8004 adoption is registration-heavy but operationally shallow. While the identity layer is visible at scale, metadata availability, service exposure, reputation formation, and cross-chain evidence remain limited. Ownership and feedback activity are also highly concentrated, suggesting that early participation is shaped by a small number of high-activity wallets and clients. The network analysis further shows that richer operational evidence clusters around a small subset of agents rather than being broadly distributed across the ecosystem. The findings suggest that ERC-8004 provides an important identity layer for decentralized AI agents, but the transition from agent identity to agent economy remains incomplete.

2606.12126 2026-06-11 cs.CV 新提交

AGE-MIL: Anchor-Guided Evidence Learning for Patient-Level Prediction

AGE-MIL: 锚点引导的证据学习用于患者级别预测

Jiawei Niu, Jian Chen, Di Zhang, Junbo Lu, Zhangcheng Liao, Xuhao Liu, Honglin Zhong, Mireia Crispin-Ortuzar, Chen Li, Zeyu Gao, Yi Cai

发表机构 * School of Computer Science and Technology, Xi’an Jiaotong University(西安交通大学计算机科学与技术学院) Department of Oncology, University of Cambridge(剑桥大学肿瘤学系) Xiangya School of Medicine, Central South University(中南大学湘雅医学院)

AI总结 提出AGE-MIL框架,通过构建患者级锚点整合多张全切片图像证据,将风险建模为证据积累过程,实现弱监督下的稳定优化,在六个任务中优于八种现有方法。

详情
Comments
11 pages, 2 figures, MICCAI early accepted
AI中文摘要

现有的计算病理学方法主要在全切片图像(WSI)级别的多实例学习(MIL)范式下运行,而患者级别的建模仍未得到充分探索。然而,在常规病理实践中,病理学家通过整合多个WSI的证据而非依赖任何单个切片来得出诊断和预后结论。当患者级别的监督直接施加于传统MIL框架时,这种差异造成了根本性的错位,常常导致优化不稳定和预测可靠性下降。为了解决这个问题,我们提出了锚点引导的证据MIL(AGE-MIL),一种用于患者级别预测的弱监督框架。AGE-MIL从切片表示中构建患者级别的锚点,以捕获全局病理上下文并指导诊断相关局部斑块的检索和整合,从而实现稳健的患者级别建模。患者级别的风险进一步被建模为证据积累过程,促进弱监督下的稳定优化。AGE-MIL在两个独立队列的六个临床相关患者级别预测任务上进行了评估。实验结果表明,所提出的框架始终优于八种最先进的MIL方法。代码可在以下网址获取:https://this https URL。

英文摘要

Existing computational pathology methods predominantly operate within whole-slide image (WSI)-level multiple instance learning (MIL) paradigms, while patient-level modeling remains underexplored. In routine pathological practice, however, pathologists derive diagnostic and prognostic conclusions by integrating evidence across multiple WSIs rather than relying on any single slide. This discrepancy creates a fundamental misalignment when patient-level supervision is directly imposed on conventional MIL frameworks, often leading to unstable optimization and degraded predictive reliability. To address this issue, we propose Anchor-Guided Evidence MIL (AGE-MIL), a weakly supervised framework for patient-level prediction. AGE-MIL constructs a patient-level anchor from slide representations to capture global pathological context and guide the retrieval and integration of diagnostically relevant local patches, enabling robust patient-level modeling. Patient-level risk is further modeled as an evidence accumulation process, promoting stable optimization under weak supervision. AGE-MIL is evaluated on six clinically relevant patient-level prediction tasks from two independent cohorts. Experimental results show that the proposed framework consistently outperforms eight state-of-the-art MIL methods. Code is available at this https URL.

2606.12125 2026-06-11 cs.CV 新提交

Q-Fold: Query-Aware Focus-Context Spatio-Temporal Folding for Long Video Understanding

Q-Fold: 查询感知的焦点-上下文时空折叠用于长视频理解

Biao Tang, Xu Chen, Shuxiang Gou, Jingyi Yuan, Yuhan Zhang, Chenqiang Gao

发表机构 * Shenzhen Campus of Sun Yat-sen University(中山大学深圳校区) Harbin Institute of Technology, Shenzhen(哈尔滨工业大学(深圳)) Shenzhen Institute for Advanced Study, University of Electronic Science and Technology of China(电子科技大学深圳高等研究院)

AI总结 提出Q-Fold,一种无需训练的长视频输入构建框架,通过查询引导将相关片段保留为高保真焦点帧,不相关片段折叠为上下文布局,在固定预算下提升多模态大模型的长视频理解性能。

详情
Comments
10 pages, 5 figures, 8 tables. Code will be made publicly available
AI中文摘要

长视频理解对多模态大语言模型仍然具有挑战性,因为时间上延长的视频通常包含数千帧,因此穷举处理成本高昂。现有方法通常在有限的视觉预算下从长视频构建紧凑的视觉输入。然而,大多数方法仍然遵循以帧为中心的范式,并对保留的内容应用相似的表示,无论其重要性如何。这使得难以同时保留高保真视觉证据和广泛的时间覆盖。为了解决这个问题,我们提出了Q-Fold,一种无需训练的长视频理解输入构建框架。Q-Fold不将孤立帧作为基本建模单元,而是对连续的时间段进行操作,并在查询引导下构建异构的焦点-上下文表示。查询相关的片段被保留为高保真的焦点帧,而不太相关的片段被折叠成保持时间顺序的上下文布局。通过这种方式,Q-Fold保留了关键的视觉证据和广泛的时间覆盖,同时更好地保持了短片段内的局部时间连续性。在四个长视频基准测试和多个视频多模态大模型上的实验表明,Q-Fold在不增加输入预算的情况下持续提升性能。值得注意的是,它在一个超长视频基准测试上取得了高达9.1个百分点的提升。代码将公开提供。

英文摘要

Long-video understanding remains challenging for multimodal large language models, because temporally extended videos often contain thousands of frames and are therefore expensive to process exhaustively. Existing methods usually construct compact visual inputs from long videos under a limited visual budget. However, most of them still follow a frame-centric paradigm and apply similar representations to retained content regardless of its importance. This makes it difficult to preserve both high-fidelity visual evidence and broad temporal coverage. To address this issue, we propose Q-Fold, a training-free input construction framework for long-video understanding. Instead of treating isolated frames as the basic modeling unit, Q-Fold operates on contiguous temporal segments and constructs a heterogeneous Focus--Context representation under query guidance. Query-relevant segments are preserved as high-fidelity Focus Frames, while less relevant segments are folded into chronology-preserving contextual layouts. In this way, Q-Fold preserves critical visual evidence and broad temporal coverage, while better maintaining local temporal continuity within short segments. Experiments on four long-video benchmarks with multiple Video-MLLMs show that Q-Fold consistently improves performance without increasing the input budget. Notably, it achieves gains of up to 9.1 percentage points on an ultra-long video benchmark. Code will be made publicly available.

2606.12120 2026-06-11 cs.LG math.OC 新提交

A Riemannian Approach to Low-Rank Optimal Transport

低秩最优传输的黎曼方法

Pratik Jawanpuria, Bamdev Mishra

发表机构 * Centre for Machine Intelligence and Data Science, IIT Bombay(印度理工学院孟买分校机器智能与数据科学中心) Microsoft India(微软印度)

AI总结 提出黎曼几何框架用于低秩最优传输,通过将平衡与不平衡秩r正因子耦合建模为光滑子流形,并采用Fisher-Rao乘积度量,实现高效的一阶和二阶求解器,在收敛速度和性能上超越现有方法。

详情
AI中文摘要

低秩最优传输(OT)缓解了经典求解器的二次缩放问题,但现有方法严重依赖需要仔细调整超参数且忽略优化景观曲率的一阶镜像下降更新。为了解决这些局限性,我们提出了一个统一的低秩OT黎曼几何框架,将平衡和不平衡秩$r$正因子耦合建模为正象限的新型光滑嵌入子流形。通过为这些流形配备Fisher-Rao乘积度量,我们推导出黎曼投影、收缩和Hessian-向量积的可处理公式。我们的成本无关框架无缝扩展到线性OT、Gromov-Wasserstein(GW)、融合GW及其不平衡对应物。对于平衡OT,我们的几何成分通过高效的共轭梯度和迭代Bregman更新计算。对于不平衡OT,我们的操作优雅地简化为闭式缩放,完全消除了内部迭代循环。在两种情况下,每次迭代的复杂度与数据集大小呈线性关系,并且我们提供了用于全局最优性验证的秩充分性证书。跨一系列问题规模的大量实验表明,我们的无正则化一阶和二阶求解器在收敛速度和性能上优于现有最先进的低秩OT求解器。

英文摘要

Low-rank optimal transport (OT) mitigates the quadratic scaling of classical solvers, yet existing approaches rely heavily on first-order mirror-descent updates that require careful hyperparameter tuning and ignore the optimization landscape's curvature. To address these limitations, we propose a unified Riemannian geometric framework for low-rank OT, modeling balanced and unbalanced rank-$r$ positive factored couplings as novel smooth embedded submanifolds of the positive orthant. By equipping these manifolds with the Fisher-Rao product metric, we derive tractable formulations for Riemannian projectors, retractions, and Hessian-vector products. Our cost-agnostic framework seamlessly extends to linear OT, Gromov-Wasserstein (GW), fused GW, and their unbalanced counterparts. For balanced OT, our geometric ingredients are computed via efficient conjugate-gradient and iterative Bregman updates. For the unbalanced OT, our operations elegantly reduce to closed-form scalings, completely eliminating inner iterative loops. In both regimes, per-iteration complexity scales linearly with dataset size, and we provide a rank-sufficiency certificate for global optimality verification. Extensive experiments across a range of problem sizes demonstrate that our regularization-free first- and second-order solvers achieve faster convergence and superior performance over existing state-of-the-art low-rank OT solvers.

2606.12117 2026-06-11 cs.CL cs.AI 新提交

Soft-Prompt Tuning for Fair and Efficient LLM Benchmark Evaluation

软提示调优用于公平且高效的LLM基准评估

Selen Erkan, Bastian Boll, Kristian Kersting, Björn Deiseroth, Letitia Parcalabescu

发表机构 * Aleph Alpha Research Lab(Aleph Alpha 研究实验室) TU Darmstadt(达姆施塔特工业大学) Hessian.AI(黑森人工智能中心)

AI总结 提出软提示调优方法,通过优化少量软提示向量使基础模型适应基准格式,公平评估其真实知识,效率高且无需完整后训练。

详情
Comments
10 pages, 4 figures
AI中文摘要

基准分数常常错误地反映大型语言模型(LLM)的知识,因为它们依赖于模型遵循特定格式要求的能力等。这尤其惩罚了那些可能知道正确答案但缺乏按照指示结构化答案能力的基础模型——这种能力通常在后训练中引入。为了克服这一点,我们提出了软提示调优,一种高效、公平且架构无关的模型评估方法。通过在短时间调优内仅优化10个软提示向量(对于7B模型大约占参数的0.0006%),我们使模型适应特定的基准格式,缩小格式遵循方面的差距,确保底层知识准确地反映在基准分数中。这使得人们可以在基准上公平比较不同基础模型(使用各种预训练配方训练),而无需完整的后训练。我们在7个模型和7个数据集上评估了软提示调优。结果表明:(a) 软提示调优在80步(约640个样本)内使格式遵循饱和,因此非常高效;(b) 软提示调优显著优于零样本和少样本提示,揭示了标准提示遗漏的基础模型知识;(c) 即使后训练模型也可以从软提示中受益以最大化格式遵从性;(d) 软提示的基础模型性能比零样本和少样本基线更可靠地预测后训练模型的排名,为下游模型质量提供了低成本的代理。我们的贡献包括:(1) 解耦格式遵循和知识准确性的度量标准;(2) 更公平的LLM知识基准测试协议;(3) 一种成本效益高且内存有效的方案,用于在LLM开发早期识别最优预训练策略。

英文摘要

Benchmark scores often misrepresent a large language model's (LLM's) knowledge, because they rely, e.g., on the model's ability to follow specific formatting requirements. This especially penalizes base models that may know the correct answers but lack the ability -- typically introduced in post-training -- to structure them as instructed. To overcome this, we propose soft-prompt tuning, an efficient, fair, and architecture-agnostic model evaluation. By optimizing only 10 soft-prompt vectors (roughly 0.0006% parameters for a 7B model) over a short tuning period, we adapt models to specific benchmark formats, closing gaps in format-following and ensuring that underlying knowledge is accurately reflected in benchmark scores. This allows one to fairly compare different base models -- trained with various pre-training recipes -- on benchmarks without the need for full post-training. We evaluated soft-prompt tuning across 7 models and 7 datasets. The results show that (a) soft-prompt tuning saturates format-following within 80 steps (~640 samples) making it highly efficient, (b) soft-prompt tuning significantly outperforms zero- and few-shot prompting, surfacing base model knowledge that standard prompting misses, that (c) even post-trained models can benefit from soft-prompts to maximize format compliance, and that (d) soft-prompted base model performance predicts post-trained model rankings more reliably than zero- and few-shot baselines, offering a low-cost proxy for downstream model quality. Our contributions include (1) metrics which disentangle format-following and knowledge accuracy, (2) a fairer benchmarking protocol of LLM knowledge, and (3) a cost- and memory-effective recipe to identify optimal pre-training strategies early in LLM development.

2606.12114 2026-06-11 cs.CL 新提交

Detecting Sensitive Personal Information in Japanese Pre-Training Corpora for Large Language Models

检测日语大语言模型预训练语料库中的敏感个人信息

Rei Minamoto, Yusuke Oda, Daisuke Kawahara

发表机构 * Waseda University(早稻田大学) Research and Development Center for LLMs, National Institute of Informatics(国立信息学研究所大语言模型研发中心)

AI总结 针对日语大语言模型预训练语料中的敏感个人信息,基于日本《个人信息保护法》定义的特殊要保护个人信息,构建数据集并训练机器学习模型进行快速检测,首次探索日语文本中的SCPI检测。

详情
AI中文摘要

敏感个人信息可能出现在大语言模型(LLMs)的大规模预训练语料中。因此,检测和过滤此类信息对于确保遵守隐私法规和防止意外信息泄露至关重要。然而,与英语和其他语言相比,日语中关于敏感个人信息的研究有限。在本研究中,我们聚焦于日本《个人信息保护法》(APPI)中定义为特殊要保护个人信息(SCPI)的敏感个人数据。我们使用基于LLM的标注构建了一个SCPI数据集,并训练机器学习模型以快速检测文本中的SCPI。结果,我们的SCPI分类器能够有效识别与SCPI相关的信息。本研究首次探索日语文本语料库中的SCPI检测,突显了准确检测的挑战。

英文摘要

Sensitive personal information can appear in large-scale pre-training corpora for large language models (LLMs). Detecting and filtering such information is therefore essential to ensure compliance with privacy regulations and prevent unintended information leakage. However, in contrast to English and other languages, research into sensitive personal information has been limited in the Japanese language. In this study, we focus on sensitive personal data defined as special care-required personal information (SCPI) under Japan's Act on the Protection of Personal Information (APPI). We construct an SCPI dataset using LLM-based annotation and train machine learning models to rapidly detect SCPI in text. As a result, our SCPI classifier can effectively identify information related to SCPI. This study is the first to explore SCPI detection in Japanese text corpora, highlighting the challenges of accurate detection.

2606.12113 2026-06-11 cs.CL cs.AI 新提交

Augmenting Molecular Language Models with Local $n$-gram Memory

增强分子语言模型的局部 $n$-gram 记忆

Xinni Zhang, Zijing Liu, He Cao, Yu Li, Irwin King

发表机构 * The Chinese University of Hong Kong(香港中文大学) International Digital Economy Academy(国际数字经济学院)

AI总结 针对SMILES字符串的Transformer模型因字符级分词破坏化学语义的问题,提出MolGram模块,通过条件$n$-gram记忆哈希查找注入局部上下文,在三个任务上以更少参数超越基线。

详情
AI中文摘要

基于Transformer的SMILES字符串语言模型存在局部性差距:标准字符级分词会破坏化学上有意义的模式,迫使模型反复学习局部语法而牺牲长距离依赖。为了解决这个问题而不干扰标准分词器,我们提出了MolGram,它将条件$n$-gram记忆模块集成到分子语言模型中。MolGram通过可扩展的哈希查找将局部字符串模式映射到学习到的嵌入,并动态地将这种区域上下文注入隐藏状态。在三个任务(包括无条件分子生成、正向反应预测和单步逆合成)上的评估表明,MolGram持续提升性能。关键的是,我们的分析表明,MolGram以3倍更少的参数优于基线,将显式局部模式记忆确立为一种高效的归纳偏置。

英文摘要

Transformer-based language models for SMILES strings suffer from a locality gap: standard character-level tokenization fragments chemically meaningful motifs, forcing models to repeatedly learn local syntax at the expense of long-range dependencies. To address this without disrupting standard tokenizers, we propose MolGram, which integrates a conditional $n$-gram memory module into molecular language models. MolGram maps local string patterns to learned embeddings via scalable hash lookups and dynamically injects this regional context into hidden states. Evaluations across three tasks, including unconditional molecule generation, forward reaction prediction, and single-step retrosynthesis, show that MolGram consistently improves performance. Crucially, our analyses demonstrate that MolGram outperforms baselines with 3$\times$ more parameters, establishing explicit local pattern memory as a highly efficient inductive bias.

2606.12112 2026-06-11 cs.RO 新提交

PEBRE: An Open-Hardware Compute and Perception Add-On for the Pepper Robot

PEBRE:Pepper 机器人的开源硬件计算与感知扩展模块

Malte Kuhlmann, Ignacio Bugueno-Cordova, Emil Alms, Javier Ruiz-del-Solar, Nicolás Navarro-Guerrero

发表机构 * Leibniz Universität Hannover(莱布尼茨汉诺威大学) University of Chile(智利大学)

AI总结 本文提出 PEBRE,一种为 Pepper 机器人设计的开源硬件扩展模块,通过集成 Jetson Orin Nano 等组件显著提升其计算与感知能力,并延长平台使用寿命。

详情
AI中文摘要

本文介绍了 PEBRE 的设计、开发与实验验证,PEBRE 是一种用于 Pepper 机器人快速软件开发的开放硬件扩展模块。我们的项目通过集成 Jetson Orin Nano、Logitech BRIO、Intel RealSense D435i、Samson UB1 和 RØDE VideoMicro II 等外部组件,增强了 Pepper 的计算和感知能力。结果表明,新硬件显著提升了 Pepper 的感知能力和计算性能。这一开发通过为 Pepper 机器人实现开放硬件和开源模块化扩展模块,并保持这一相关研究平台在其预期寿命之外的功能性,为社区做出了贡献。通过 PEBRE,我们旨在促进更快速的软件开发以及外部组件的更高效集成,最终增强 Pepper 机器人的能力。

英文摘要

This paper presents the design, development, and experimental verification of PEBRE, an open-hardware add-on for fast software development on the Pepper Robot. Our project enhances Pepper's computational and perception capabilities by integrating external components such as a Jetson Orin Nano, Logitech BRIO, Intel RealSense D435i, Samson UB1, and RØDE VideoMicro II. Our results show that the new hardware considerably improved Pepper's perception abilities and computational power. This development contributes to the community by implementing an open hardware and open-source modular add-on to the Pepper robot and keeping this relevant research platform functional beyond its expected lifespan. With PEBRE, we aim to facilitate faster software development and more efficient integration of external components, ultimately enhancing the capabilities of the Pepper robot.

2606.12109 2026-06-11 cs.RO cs.AI 新提交

Bridging the Morphology Gap: Adapting VLA Models to Dexterous Manipulation via Intent-Conditioned Fine-Tuning

弥合形态差距:通过意图条件微调使VLA模型适应灵巧操作

Chuanke Pang, Junyi Huang, Zhijun Zhao, Yaobing Wang, Kun Xu, Xilun Ding

发表机构 * Beihang University(北京航空航天大学) China Academy of Space Technology(中国空间技术研究院)

AI总结 提出InDex框架,通过将预训练的1-DoF平行抓取输出重用作宏观虚拟抓取意图代理,结合两阶段解耦学习架构,实现VLA模型从低自由度夹爪到高自由度灵巧手的适应,有效缓解灾难性遗忘和动作流形坍缩。

详情
AI中文摘要

视觉-语言-动作(VLA)模型在机器人操作中展现了显著的零样本泛化能力,然而绝大多数预训练流程严格局限于低自由度平行夹爪。将这些丰富的语义先验适应到高自由度灵巧手引入了严重的形态差距,直接的端到端联合微调会由于数据稀缺而导致空间推理的灾难性遗忘和急性动作流形坍缩。在本文中,我们提出了InDex,一种新颖的、数据高效的适应框架,其根植于跨形态语义继承。我们不丢弃预训练的1-DoF平行抓取输出,而是将其重新用作连续的、宏观的虚拟抓取意图代理,以顺序化控制拓扑。我们实现了一个两阶段解耦学习架构:第一阶段参数高效地将VLA主干对齐以预测连续的臂轨迹和标量抓取意图;第二阶段冻结该空间主干,并利用一个意图条件去噪扩散头来解码多指末端执行器的细粒度关节运动。跨一系列多阶段、高接触灵巧操作任务的广泛模拟基准测试表明,InDex能够以最少的演示数据有效掌握复杂技能,显著优于整体基线,同时保留了原始VLA先验的鲁棒空间泛化能力。

英文摘要

Vision-Language-Action (VLA) models have demonstrated remarkable zero-shot generalization in robotic manipulation, yet the vast majority of pre-trained pipelines remain strictly confined to low-DoF parallel grippers. Adapting these rich semantic priors to high-DoF dexterous hands introduces a severe morphology gap, direct end-to-end joint fine-tuning inherently causes catastrophic forgetting of spatial reasoning and acute action manifold collapse due to data scarcity. In this paper, we present InDex, a novel, data-efficient adaptation framework rooted in cross-morphology semantic inheritance. Rather than discarding the pre-trained 1-DoF parallel grasp output, we repurpose it as a continuous, macroscopic virtual grasp intent proxy to sequentialize the control topology. We implement a two-stage decoupled learning architecture: the first stage parameter-efficiently aligns the VLA backbone to predict continuous arm trajectories and the scalar grasp intent; the second stage freezes this spatial backbone and leverages an intent-conditioned denoising diffusion head to decode fine-grained joint articulations for multi-fingered end-effectors. Extensive simulation benchmarks across a suite of multi-stage, contact-rich dexterous manipulation tasks demonstrate that InDex effectively masters intricate skills with minimal demonstration data, substantially outperforming monolithic baselines while preserving the robust spatial generalizability of the original VLA prior.

2606.12106 2026-06-11 cs.CV cs.AI 新提交

MSUE: Multi-Modal Soccer Understanding Expert

MSUE:多模态足球理解专家

Litao Li, Yibo Yu, Yufeng Hu, Zhuo Yang, Jiali Wen, Yixin Chen, Yixi Zhou

发表机构 * South China University of Technology(华南理工大学) Johns Hopkins University(约翰霍普金斯大学) Peking University(北京大学) University of Electronic Science and Technology of China(电子科技大学)

AI总结 提出MSUE多专家问答架构,结合VLM数据合成管道与LLM动态调度文本、图像、视频专家,在SoccerNet VQA挑战中达到0.95准确率,获第三名。

详情
Comments
6 pages, 1 figures
AI中文摘要

本文介绍了我们对2026年SoccerNet VQA挑战赛的解决方案。我们首先开发了一个由视觉语言模型(VLM)驱动的低成本数据合成管道,该系统将原始领域数据系统地重构为多样化的VQA样本,包括简洁答案和长文本回复。其次,我们提出了MSUE,一种多专家问答架构,采用大语言模型(LLM)将问题动态分发给文本、图像和视频专家。这些专家分别实例化为强大的文本基线Gemini3-Flash、微调的Qwen3-VL和外部知识库,协同工作以提升VQA性能。MSUE在挑战基准上达到了\textbf{0.95}的准确率,在排行榜上获得第三名。

英文摘要

This paper presents our solution to the 2026 SoccerNet VQA Challenge. We first develop a cost-effective data synthesis pipeline driven by a Vision-Language Model (VLM), which systematically restructures raw domain data into diverse VQA samples, including concise answers and long-form responses. Second, we propose MSUE, a multi-expert question answering architecture that employs a Large Language Model (LLM) to dynamically dispatch questions to text, image, and video experts. These experts are instantiated as a strong text baseline Gemini3-Flash, a fine-tuned Qwen3-VL, and an external knowledge base, respectively, working collaboratively to enhance VQA performance. MSUE achieves an accuracy of \textbf{0.95} on the challenge benchmark, securing third place in the leaderboard.

2606.12105 2026-06-11 cs.RO cs.CV cs.LG 新提交

DAM-VLA: Decoupled Asynchronous Multimodal Vision Language Action model

DAM-VLA: 解耦异步多模态视觉语言动作模型

Pankhuri Vanjani, Zhuoyue Li, Jakub Suliga, Moritz Reuss, Gianluca Geraci, Xinkai Jiang, Rudolf Lioutikov

发表机构 * Intuitive Robots Lab, Karlsruhe Institute of Technology (KIT)(直觉机器人实验室,卡尔斯鲁厄理工学院) NVIDIA(英伟达) Robotics Institute of Germany(德国机器人研究所)

AI总结 针对VLA模型同步时钟与物理交互中不同模态频率不匹配的问题,提出DAM-VLA,通过解耦各模态时间处理、维护传感器速率更新的潜在缓冲区,并利用门控交叉注意力整合高频模态,在7个真实操作任务中平均成功率提升至95.2%。

详情
Comments
17 pages, 8 figures
AI中文摘要

视觉-语言-动作(VLA)模型继承了视觉-语言预训练中的共享同步时钟,以单一速率处理每个输入。这与物理交互不一致,在物理交互中,高频模态以数百赫兹变化,视觉演化较慢,而语言在整个回合中保持不变。同步VLA会过采样慢速模态,欠采样快速模态,并将动作生成限制在最低有效频率。我们假设解耦每个模态的时间处理,让每个模态以其自身传感器速率更新和保留信息,可以产生更强的表示和更鲁棒的控制。我们提出DAM-VLA,它维护每个模态的潜在缓冲区,以传感器速率刷新并由动作头连续读取,通过门控交叉注意力整合新的高频模态,同时保持预训练主干不变。在七个接触丰富的真实世界操作任务中,DAM-VLA将最强同步基线的平均成功率提高了一倍以上(95.2% vs. 40.95%),同时维持平滑、反应式的100 Hz控制。项目网站:\href{ this https URL }{ this http URL }

英文摘要

Vision-language-action (VLA) models inherit a shared synchronous clock from vision-language pretraining, processing every input at one rate. This is misaligned with physical interaction, where a high-frequency modality changes at hundreds of hertz, vision evolves more slowly, and language stays constant across an episode. A synchronous VLA oversamples slow modalities, undersamples fast ones, and caps action generation at the lowest effective frequency. We hypothesize that decoupling temporal processing per modality, letting each update and retain information at its own sensor rate, yields stronger representations and more robust control. We present DAM-VLA, which maintains per-modality latent buffers refreshed at sensor rates and read continuously by the action head, integrating new high-frequency modalities through gated cross-attention that leaves the pretrained backbone intact. Across seven contact-rich real-world manipulation tasks, DAM-VLA more than doubles the average success rate of the strongest synchronous baseline (95.2\% vs.\ 40.95\%) while sustaining smooth, reactive 100\,Hz control. Project website: \href{ this https URL }{ this http URL }

2606.12103 2026-06-11 cs.DC 新提交

The PM-EdgeMap: Towards Real-Time Process Mining on the Edge-Cloud Continuum

PM-EdgeMap:迈向边缘-云连续体上的实时过程挖掘

Hendrik Reiter, Christian Imenkamp, Olaf Landsiedel, Andrea Maldonado, Patrick Rathje, Wilhelm Hasselbring

AI总结 提出PM-EdgeMap框架,在边缘-云连续体上实现实时过程挖掘,通过边缘一致性检查算法验证可行性,提升智能工厂自主控制能力。

详情
AI中文摘要

智能工厂正在演变为网络物理系统(CPS),要求更高的自主性。这需要基于传感器数据洞察的实时决策。过程挖掘提供了一种获取此类洞察并指导行动的有价值方法。边缘计算范式通过实现传感器之间的网络通信并利用附近的计算资源来支持这一实时需求。本文研究了在边缘上执行实时过程挖掘算法的影响。在本文中,我们首先提出了一种形式化方法来描述相关数据集和计算拓扑。然后,我们通过一个涉及基于边缘的一致性检查算法的案例研究来评估边缘计算方法。结果证明了基于边缘的实时过程挖掘在增强智能工厂自主控制方面的可行性和优势。

英文摘要

Smart factories are evolving into Cyber-Physical Systems (CPS), demanding increased autonomy. This necessitates real-time decision making, facilitated by insights derived from sensor data. Process mining offers a valuable approach to gain such insights and guide actions. The edge computing paradigm supports this real-time requirement by enabling network communication between sensors and leveraging nearby computing resources. This paper investigates the implications of performing real-time process mining algorithms on the edge. Within this paper, we first propose a formalism to describe relevant datasets and the computing topology. We then evaluate the edge computing approach through a case study involving an edge-based conformance checking algorithm. The results demonstrate the feasibility and benefits of edge-based real-time process mining for enhanced autonomous control in smart factories.

2606.12100 2026-06-11 cs.SC cs.CC 新提交

Quasi-linear Time Multiplication of Sparse Polynomials with Integer Coefficients

整数系数稀疏多项式的拟线性时间乘法

Qiao-Long Huang, Yichuan Cao, Ruichen Qiu, Xiao-Shan Gao

AI总结 针对整数系数稀疏多项式乘法,通过模块化黑盒插值算法实现拟线性位复杂度,并反驳了此前声称的解决方案。

详情
AI中文摘要

稀疏多项式乘法是计算机代数和计算理论中的一个基本问题,开发拟线性时间输出敏感的乘法算法一直是一个公开挑战。本文针对整数系数情况,为先前声称的该公开问题的解决方案提供了一个反例。通过采用现有的拟线性模块化黑盒插值算法,我们能够为整数系数设置提供具有拟线性位复杂度的算法。此外,在系数属于有限域的情况下,我们获得了一个位复杂度与项数、度数的对数以及有限域大小的对数成线性关系的算法。

英文摘要

Sparse polynomial multiplication is a fundamental problem in computer algebra and the theory of computation, and the development of a quasi-linear time output-sensitive multiplication algorithm has been posed as an open challenge. In this paper, a counterexample is provided to a previously claimed solution to this open problem for integer coefficients. By employing the existing quasi-linear modular-black-box interpolation algorithm, we are able to provide an algorithm with quasi-linear bit complexity for the integer coefficients setting. Furthermore, in the case of coefficients over a finite field, we obtain an algorithm whose bit complexity is linear in the number of terms, the logarithm of the degree, and the logarithm of the size of the finite field.

2606.12099 2026-06-11 cs.CV 新提交

ISAP-3D: Identity-Slot Aligned Part-Aware 3D Generation

ISAP-3D: 身份槽对齐的部件感知3D生成

Junlin Hao, Haoshuai Fu, Xibin Song, Wei Li, Ruigang Yang, Xinggong Zhang, Jinchuan Zhang

发表机构 * Peking University(北京大学) Tencent(腾讯) Huawei(华为) University of Science and Technology of China(中国科学技术大学)

AI总结 针对部件感知3D生成中因身份-布局纠缠导致的结构歧义问题,提出身份槽对齐框架ISAP-3D,通过语义身份令牌锚定每个部件并进行一对一布局预测,实现稳定可控的部件级3D生成。

详情
AI中文摘要

部件感知3D生成旨在合成具有语义意义组件的结构化对象,但由于身份-布局纠缠,常常遭受结构歧义。现有方法要么隐式推断部件身份和空间布局,导致不稳定的部件分配(例如槽交换或部件合并),要么依赖在实践中难以获得的强布局条件。我们将这种歧义归因于身份槽置换自由度:没有显式的身份槽对齐,训练期间语义部件和生成槽之间的对应关系不可识别,允许多个槽分配适应相同的监督,导致不一致的分解。基于这一见解,我们认为稳定的部件感知生成需要身份对齐的一对一槽建模。因此,我们提出了一个身份槽对齐框架ISAP-3D,该框架用语义身份令牌锚定每个部件,执行身份条件的一对一布局预测,随后进行布局条件的几何合成。结构化的局部-全局条件在语义、空间和几何阶段保持身份对齐。我们还构建了一个具有统一语义协议的部件级数据集,以实现可学习且一致的身份槽对齐。大量实验表明,与最先进的部件感知生成基线相比,我们的方法在结构稳定性、可控性和鲁棒性方面有所改进。

英文摘要

Part-aware 3D generation aims to synthesize structured objects with semantically meaningful components, yet often suffers from structural ambiguity due to identity-layout entanglement. Existing methods either infer part identity and spatial layout implicitly, which can lead to unstable part allocation (e.g., slot swapping or part merging), or rely on strong layout conditions that are difficult to obtain in practice. We attribute this ambiguity to identity-slot permutation freedom: without explicit identity-slot alignment, the correspondence between semantic parts and generation slots is not identifiable during training, allowing multiple slot assignments to fit the same supervision and leading to inconsistent decomposition. Based on this insight, we argue that stable part-aware generation requires identity-aligned one-to-one slot modelling. We therefore propose an identity-slot aligned framework, ISAP-3D, which anchors each part with semantic identity tokens and performs identity-conditioned one-to-one layout prediction, followed by layout-conditioned geometry synthesis. Structured local-global conditioning maintains identity alignment across semantic, spatial, and geometric stages. We also construct a part-level dataset with a unified semantic protocol to enable learnable and consistent identity-slot alignment. Extensive experiments demonstrate improved structural stability, controllability, and robustness over state-of-the-art part-aware generation baselines.

2606.12088 2026-06-11 cs.CL 新提交

Debiasing Without Protected Attributes: Latent Concept Erasure from Textual Profiles

无保护属性的去偏:从文本画像中消除潜在概念

Shun Shao, Zheng Zhao, Anna Korhonen, Yftah Ziser, Shay B. Cohen

发表机构 * University of Cambridge(剑桥大学) University of Edinburgh(爱丁堡大学) University of Groningen(格罗宁根大学) NVIDIA Research(英伟达研究院)

AI总结 提出H-SAL方法,利用自我描述文本作为隐式信号进行后处理概念和属性消除,在无直接敏感属性下实现去偏,并在多领域Stack Exchange基准上验证其效果与显式标签去偏相当或更优。

详情
Comments
23 pages, 5 figures, 12 tables. The paper is currently under review
AI中文摘要

大多数自然语言处理中的公平性研究假设可以直接访问性别、种族或国籍等保护属性。然而,在实践中,由于隐私限制、元数据缺失或法律约束,这些信息通常不可用,尽管模型可能从间接文本线索中推断出来。这引发了一个关键问题:在没有直接访问敏感属性的情况下,去偏能否成功?我们提出了H-SAL,它利用自我描述文本作为隐式去偏信号,执行事后概念和属性消除。为了支持这一设置,我们引入了一个基于Stack Exchange的多领域公平性基准,用于帮助度预测,该基准包括显式和隐式信号,从而能够在有保护标签的标准去偏和无敏感信息访问的去偏之间进行比较。在编码器和仅解码器语言模型中,我们发现隐式自我描述通常匹配或优于基于显式标签的去偏。我们的结果拓宽了表示层面的公平性研究,并为在现实数据约束下研究去偏提供了新的基准。

英文摘要

Most fairness research in NLP assumes direct access to protected attributes such as gender, race, or nationality. In practice, however, such information is often unavailable due to privacy constraints, missing metadata, or legal restrictions, even though models may infer it from indirect textual cues. This raises a key question: can debiasing succeed without direct access to sensitive attributes? We propose H-SAL, which performs post-hoc concept and attribute erasure using self-description text as an implicit debiasing signal. To support this setting, we introduce a multi-domain Stack Exchange-based fairness benchmark for helpfulness prediction that includes both explicit and implicit signals, enabling comparison between standard debiasing with protected labels and debiasing without access to sensitive information. Across encoder and decoder-only language models, we find that implicit self-description often matches or outperforms explicit-label-based debiasing. Our results broaden representation-level fairness research and provide a new benchmark for studying debiasing under realistic data constraints.

2606.12087 2026-06-11 cs.CL 新提交

FORT-Searcher: Synthesizing Shortcut-Resistant Search Tasks for Training Deep Search Agents

FORT-Searcher:合成抗捷径搜索任务以训练深度搜索智能体

Jia Deng, Yimeng Chen, Xiaoqing Xiang, Ziyang Zeng, Shuo Tang, Wayne Xin Zhao, Feng Chang, Chuan Hao, Yuan Wei, Ran Tao, Bryan Dai, Ji-Rong Wen

发表机构 * Gaoling School of Artificial Intelligence Renmin University of China(中国人民大学高瓴人工智能学院) KAUST(阿卜杜拉国王科技大学) IQuest Research(IQuest研究院) Shanghai Jiao Tong University(上海交通大学)

AI总结 提出FORT框架,通过控制四种捷径风险合成抗捷径训练数据,使搜索智能体进行更长的预答案搜索,减少捷径模式,仅用SFT训练即达到最优性能。

详情
Comments
30 pages
AI中文摘要

训练深度搜索智能体需要可验证的问题,其答案只有在通过搜索获得足够证据后才可用。现有的合成方法通常通过丰富图结构来增加表面难度,但仅凭结构复杂性并不能保证实现实际的搜索难度:预期的搜索过程可能通过更便宜的识别路径崩溃。我们用一个捷径感知的难度框架形式化了这一差距,并识别了四种可操作的捷径风险:证据共覆盖、单线索选择性、暴露常数和先验知识绑定。为了诊断它们的实际效果,我们使用轨迹签名,包括求解成本、答案命中时间和先验捷径率。在此框架的指导下,我们引入了FORT,一个抗捷径训练数据合成框架。FORT通过控制实体选择、证据图构建、问题表述和对抗性细化中的捷径风险来构建抗捷径训练数据。实验表明,与现有的开源深度搜索数据集相比,FORT诱导了更长的预答案搜索和更少的捷径模式。使用由此产生的轨迹,我们仅通过监督微调(SFT)训练FORT-Searcher,并在具有挑战性的深度搜索基准上取得了可比大小的开源搜索智能体中最佳的整体性能。相关资源将在https://this URL上提供。

英文摘要

Training deep search agents requires verifiable questions whose answers remain unavailable until sufficient evidence has been acquired through search. Existing synthesis methods often increase apparent difficulty by enriching graph structures, but structural complexity alone does not guarantee realized search difficulty: the intended search process can collapse through a cheaper identifying route. We formalize this gap with a shortcut-aware difficulty framework and identify four actionable shortcut risks: evidence co-coverage, single-clue selectivity, exposed constants, and prior-knowledge binding. To diagnose their realized effects, we use trajectory signatures including solving cost, answer hit time, and prior-shortcut rate. Guided by this framework, we introduce FORT, a Framework of Shortcut-Resistant Training-Data Synthesis. FORT constructs shortcut-resistant training data by controlling shortcut risks across entity selection, evidence graph construction, question formulation, and adversarial refinement. Experiments show that FORT induces longer pre-answer search and fewer shortcut patterns than existing open-source deep search datasets. Using the resulting trajectories, we train FORT-Searcher with supervised fine-tuning (SFT) only, and it achieves the best overall performance among comparable-size open-source search agents on challenging deep search benchmarks. Relevant resources will be made available at this https URL.

2606.12086 2026-06-11 cs.AI cs.LG 新提交

IntElicit: Eliciting and Assessing Contextualized Creativity via Dialogue Policy Optimization

IntElicit: 通过对话策略优化引出和评估情境化创造力

Mingjia Li, Jin Wu, Hong Qian, Wenhao Huang, Yiyang Huang, Yiwen Zhang, Chanjin Zheng, Xiangfeng Wang, Aimin Zhou, Jiajun Guo

发表机构 * East China Normal University(华东师范大学) Shanghai Innovation Institute(上海创新研究院)

AI总结 提出IntElicit框架,通过分解过程奖励机制优化对话策略,在交互中减少非创造性混淆因素,从而更有效地引出和评估情境化创造力。

详情
AI中文摘要

情境化评估为评估创造力提供了高生态效度,但也引入了一个关键挑战:观察到的表现可能与认知熟练度(领域知识)和能动性(参与意愿)相混淆。同时,在生成式AI时代,创造性问题解决越来越多地发生在工具中介和人机交互环境中,使得完全静态的评估与当代创造性实践不太一致。为了解决这些问题,本文提出了IntElicit,一个通过对话策略优化来引出和评估情境化创造力的框架。IntElicit作为一个受约束的自适应AI面试官:它在多轮交互中提供非指导性的知识和能动性支架,以减少非创造性混淆因素,同时保留参与者生成被评估的创造性内容的责任。具体来说,为了解决开放教育对话中的稀疏奖励和潜在奖励破解(例如,答案听写),IntElicit引入了一种分解过程奖励机制。该机制将策略与教学引出对齐,奖励那些引出参与者推理而非代表他们产生最优答案的提示。大量实验,包括参与者模拟和一项人类受试者研究(N=64),表明IntElicit比专家设计的基线提高了引出的创造性成果。总之,结果表明,交互式引出可以揭示静态FPSP式评估可能遗漏的创造性潜力,为AI中介学习环境中的情境化创造力评估提供了形成性和诊断性视角。

英文摘要

Contextualized assessment offers high ecological validity for evaluating creativity but introduces a critical challenge: observed performance may be confounded with cognitive proficiency (domain knowledge) and agency (willingness to engage). Meanwhile, in the age of generative AI, creative problem solving increasingly occurs in tool-mediated and human--AI interactive environments, making fully static assessment less aligned with contemporary creative practice. To address these issues, this paper proposes IntElicit, a framework for eliciting and assessing contextualized creativity via dialogue policy optimization. IntElicit functions as a constrained adaptive AI Interviewer: it provides non-directive knowledge and agency scaffolds in multi-turn interaction to reduce non-creative confounders, while preserving participants' responsibility for generating the creative content being evaluated. Specifically, to tackle sparse rewards and potential reward hacking (e.g., answer dictation) in open-ended educational dialogue, IntElicit introduces a decomposed process reward mechanism. This mechanism aligns the policy with pedagogical elicitation, rewarding prompts that draw out participant reasoning rather than producing optimal answers on their behalf. Extensive experiments, including participant simulation and a human subject study (N=64), show that IntElicit improves elicited creative outcomes over expert-designed baselines. Together, the results suggest that interactive elicitation can reveal creative potential that static FPSP-style assessment may miss, providing a formative and diagnostic lens for contextualized creativity assessment in AI-mediated learning contexts.

2606.12077 2026-06-11 cs.LG 新提交

Efficient Time Series Clustering from Multiscale Reservoir Dynamics with Granular-Ball Anchoring Graph Optimization

基于多尺度储层动力学与粒球锚定图优化的高效时间序列聚类

Yifan Wang, Lifeng Shen, Shuyin Xia, Yi Wang

发表机构 * Chongqing Key Laboratory of Computational Intelligence, Key Laboratory of Cyberspace Big Data Intelligent Security, Ministry of Education, Sichuan-Chongqing Co-construction Key Laboratory of Digital Economy Intelligence and Key Laboratory of Big Data Intelligent Computing, College of Computer Science and Technology, Chongqing University of Posts and Telecommunications(重庆邮电大学计算机科学与技术学院,计算智能重庆市重点实验室,网络空间大数据智能安全教育部重点实验室,川渝共建数字经济智能重点实验室,大数据智能计算重点实验室) Chongqing Ant Consumer Finance Co,. Ltd , Ant Group(蚂蚁集团,重庆蚂蚁消费金融有限公司)

AI总结 提出MSRGC-Net框架,结合无训练储层计算、粒球锚定图构建和共识学习,实现高效且准确的时间序列聚类。

详情
Comments
Accepted by IJCAI 2026
AI中文摘要

时间序列聚类由于聚类效果与计算效率之间的固有权衡仍然具有挑战性。基于相似性的方法通常因成对距离计算而面临二次复杂度,而基于深度学习的方法通常依赖于昂贵的迭代训练和大量可训练参数。在本文中,我们提出了MSRGC-Net,一种高效的时间序列聚类框架,它集成了多尺度储层计算、基于粒球的锚定图构建和共识学习。MSRGC-Net采用无训练的储层计算范式,从原始时间序列中提取多尺度时间表示,无需反向传播,显著降低了计算开销。为了捕捉所得表示的内在结构,采用粒球计算通过密度一致区域自适应地建模数据分布,生成紧凑且鲁棒的锚定图表示。此外,引入了一种基于共识的锚定图优化策略,以有效对齐多尺度储层表示并整合跨时间尺度的互补信息。在广泛使用的单变量和多变量基准数据集上的大量实验表明,MSRGC-Net在聚类性能上持续优于最先进的方法,同时保持卓越的计算效率。

英文摘要

Time-series clustering remains challenging due to the inherent trade-off between clustering effectiveness and computational efficiency. Similarity-based methods often suffer from quadratic complexity caused by pairwise distance computations, while deep learning-based approaches typically rely on costly iterative training and a large number of trainable parameters. In this paper, we propose MSRGC-Net, an efficient time-series clustering framework that integrates multiscale reservoir computing, granular-ball-based anchoring graph construction, and consensus learning. MSRGC-Net adopts a training-free reservoir computing paradigm to extract multiscale temporal representations from raw time series without backpropagation, significantly reducing computational overhead. To capture the intrinsic structure of the resulting representations, granular-ball computing is employed to adaptively model data distributions via density-consistent regions, yielding compact and robust anchor graph representations. Furthermore, a consensus-based anchoring graph optimization strategy is introduced to effectively align multiscale reservoir representations and integrate complementary information across temporal scales. Extensive experiments on widely used univariate and multivariate benchmark datasets demonstrate that MSRGC-Net consistently outperforms state-of-the-art methods in clustering performance while maintaining superior computational efficiency.

2606.12075 2026-06-11 cs.CR cs.LG 新提交

Categorical Robustness Assessment for Machine Learning based Network Intrusion Detection Systems

基于机器学习的网络入侵检测系统的分类鲁棒性评估

Mayank Raj, Nathaniel D. Bastian, Lance Fiondella, Gokhan Kul

AI总结 本文系统比较了CNN、LSTM和随机森林三种分类器在对抗攻击下的鲁棒性,发现随机森林基线准确率虽高但极易被攻破,而CNN表现最稳健。

详情
AI中文摘要

网络入侵检测系统(NIDS)广泛使用机器学习(ML),但ML模型可能受到对抗性攻击的操纵。这些攻击向网络流量数据添加精心设计的扰动,导致误分类。虽然先前的工作已经证明了孤立环境下的对抗性漏洞,但在受控攻击条件下,跨架构以及基于攻击类别和类型的系统比较仍然有限,这使得从业者在对抗性环境中部署哪些模型缺乏明确指导。本文提出了一个简单的问题:当攻击者试图操纵系统时,哪种分类器架构实际上能够保持稳定?我们对三种流行架构进行了测试:一维卷积神经网络(CNN)、长短期记忆网络(LSTM)和随机森林(RF)集成。使用ACI-IoT-2023数据集(超过120万个样本,涵盖12种攻击类型),我们使用FGSM和PGD对抗攻击对每个模型进行攻击,这些攻击在归一化特征空间中应用基于梯度的扰动,符合既定的对抗性ML评估协议,扰动预算范围为$\epsilon=0.01$到$\epsilon=0.1$。令人惊讶的是,随机森林实现了近乎完美的基线准确率(99.98%),但在攻击下灾难性地崩溃,在我们测试的最小扰动下下降了73个百分点。另一方面,CNN在$\epsilon=0.01$时保持了95.5%的准确率,并且随着扰动的增加而优雅地退化。LSTM介于两者之间。这些发现颠覆了传统观念:如果模型在对抗压力的第一个迹象下就崩溃,那么高基线准确率毫无意义。对于在对抗性环境中部署入侵检测的从业者,我们推荐基于CNN的架构,并提供特定场景的部署指导。

英文摘要

Network Intrusion Detection Systems (NIDS) heavily utlize Machine Learning (ML) but ML models can be manipulated via adversarial attacks. These attacks add carefully crafted perturbations to network traffic data that leads to misclassifications. While prior work has demonstrated adversarial vulnerabilities in isolated settings, systematic cross-architecture as well as class and category of attack based comparisons under controlled attack conditions remain limited, leaving practitioners without clear guidance on which models to deploy in adversarial environments. This paper asks a simple question: what type of classifier architectures actually hold up when attackers try to manipulate the systems? We put three popular architectures through their paces: a 1D Convolutional Neural Network, a Long Short-Term Memory (LSTM) network, and a Random Forest (RF) ensemble. Using the ACI-IoT-2023 dataset (over 1.2 million samples spanning 12 attack types), we subject each model with FGSM and PGD adversarial attacks, which apply gradient-based perturbations in normalized feature space consistent with established adversarial ML evaluation protocols, at perturbation budgets ranging from $\epsilon=0.01$ to $\epsilon=0.1$. Surprisingly, Random Forest achieved near-perfect baseline accuracy (99.98\%), yet collapsed catastrophically under attack, dropping 73 percentage points at the smallest perturbation we tested. CNN, on the other hand, retained 95.5\% accuracy at $\epsilon=0.01$ and degraded gracefully as perturbations increased. LSTM fell somewhere in between. These findings flip the conventional wisdom where high baseline accuracy means nothing if a model shatters at the first sign of adversarial pressure. For practitioners deploying intrusion detection in adversarial environments, we recommend CNN-based architectures and provide scenario-specific deployment guidance.

2606.12074 2026-06-11 cs.CV cs.AI eess.IV 新提交

Non-frontal face recognition using GANs and memristor-based classifiers

基于GAN和忆阻器分类器的非正面人脸识别

Semih Vazgecen, Cristian Sestito, Spyros Stathopoulos, Themis Prodromakis

发表机构 * Centre for Electronics Frontiers, Institute for Integrated Micro and Nano Systems, School of Engineering, The University of Edinburgh(爱丁堡大学工程学院集成微纳系统研究所电子前沿中心)

AI总结 提出将轻量级GAN正面化与忆阻器神经形态识别结合,解决非正面人脸识别,在数据集上达96%准确率。

详情
Comments
12 pages, 4 figures, 1 Supplementary (22 pages, 16 figures, 6 tables, 4 supplementary notes)
AI中文摘要

人脸识别系统通过深度学习技术取得了显著进展,在复杂场景中实现了高性能和鲁棒性。然而,这些方法带来了巨大的计算开销,限制了它们在资源受限平台(如无人机)上的原位适用性,而这些平台需要应对非正面人脸图像等挑战。基于忆阻器的神经形态系统已成为边缘AI应用的一种引人注目的方法,它将生物启发式处理与高效可扩展的计算相结合。在这项工作中,我们提出了一种人脸识别框架,通过集成基于轻量级生成对抗网络(GAN)的正面化处理和基于忆阻器的神经形态识别,来解决非正面姿态变化问题。在两个数据集上的实验结果表明,将对抗学习与忆阻技术相结合的有效性,实现了高达96%的识别准确率。所提出的方法缓解了传统AI的计算瓶颈,并为动态真实环境中的人脸识别提供了一种可扩展、高效的解决方案。

英文摘要

Face recognition systems have advanced significantly through deep learning techniques, delivering high performance and robustness in complex scenarios. However, these approaches incur substantial computational overhead, limiting their in situ applicability in resource-constrained platforms such as drones, where they can address challenges including non-frontal facial imagery. Memristor-based neuromorphic systems have emerged as a compelling approach for edge AI applications, combining biologically inspired processing with efficient and scalable computation. In this work, we propose a facial recognition framework that addresses non-frontal pose variations by integrating lightweight generative adversarial network (GAN)-based pose frontalisation with memristor-based neuromorphic recognition. The experimental results on two datasets demonstrate the effectiveness of combining adversarial learning with memristive technology, achieving up to 96% identification accuracy. The proposed approach alleviates the computational bottlenecks of conventional AI and offers a scalable, efficient solution for face recognition in dynamic real-world environments.

2606.12073 2026-06-11 cs.SI cs.AI 新提交

"That's AI Slop, You Bot!" Studying Accusations, Evidence, and Credibility in Online Discourse Towards LLM-Generated Comments

“那就是AI垃圾,你这个机器人!”:研究针对LLM生成评论的指责、证据与可信度

Jason Miklian, John E. Katsos

AI总结 分析2023-2026年Hacker News和Reddit上2500万条评论,发现对AI生成文本的指责增长超十倍,但被指责的文本并非真正由AI生成,而是基于感知真实性的社会把关行为。

详情
AI中文摘要

生成式AI使得流畅的散文变得廉价易得,打破了“好文章意味着真思考”的旧承诺。读者如何回应?这能告诉我们关于反AI态度变化的什么信息?我们分析了来自Hacker News和Reddit(2023-2026年)的2500万条评论,结合了对7500个抽样AI使用指责的LLM判断、情感轨迹、300个确认AI使用指责的言语行为编码,以及被指责与未被指责的父评论的匹配对照测试。我们发现,两个平台上指责中贬义标签的份额增长了十倍以上,而2022年前的不真实性词汇(如shill、astroturf)的安慰剂词汇则没有。这一转变反映了一个快速增长的趋势:将任何可疑或看似不真实的散文标记为“AI垃圾”。AI垃圾框架现在占贬义提及的94%,主导评论的语气从嘲笑转向把关和结构性抗议。关键惊喜来自匹配对照测试,该测试发现,统计上区分AI与人类文本的散文特征并不能预测哪些人类文本会被指责为AI。新的指责作为感知真实性的社会把关,实际上并不筛查AI。这项研究扩展了信号理论,表明当底层检测问题无法在非专家层面解决时,即使不准确,社会使用的替代信号也会增长。它表明,AI对写作的影响从读者侧来看与生产(作者)侧不同。检测技术无法解决这种动态,因为指责的社会功能日益表现为社会把关和群体内信号传递,而非识别AI生成的写作。

英文摘要

Generative AI has made fluent prose cheap to produce, breaking the old promise to readers that good writing meant real thinking. How have readers responded, and what can this tell us about changing anti-AI attitudes? We analyzed 25 million comments from Hacker News and Reddit (2023-2026), combining LLM judgment on 7,500 sampled accusations of AI use, sentiment trajectories, speech-act coding of 300 confirmed accusations of AI use, and a matched-control test of accused versus non-accused parent comments. We found that the pejorative-label share of accusations rose more than tenfold on both platforms while a placebo vocabulary of pre-2022 inauthenticity terms (shill, astroturf) did not. This shift reflected a fast-growing trend of branding any suspicious or seemingly inauthentic prose as "AI slop". The slop frame now constitutes 94 percent of pejorative mentions, with the dominant comments shifting in tone from mockery toward gatekeeping and structural protest. The key surprise comes from a matched-control test which found that prose features that statistically distinguish AI from human text do not predict which human text gets accused as AI. The new accusations work as social gatekeeping of perceived authenticity without actually screening for AI. This research extends signaling theory by showing that substitute signals used socially can grow even when inaccurate if the underlying detection problem cannot be solved at the non-expert level. It shows that AI's effects on writing from the reader side are distinct from those on the production (writer) side. Detection technology cannot resolve this dynamic because the social function of accusations is increasingly to perform social gatekeeping and in-group signaling as opposed to identifying AI-generated writing.

2606.12072 2026-06-11 cs.CV 新提交

World Model Self-Distillation: Training World Models to Solve General Tasks

世界模型自蒸馏:训练世界模型以解决通用任务

Sebastian Stapf, Pablo Acuaviva Huertos, Aram Davtyan, Paolo Favaro

发表机构 * Department of Computer Science(计算机科学系)

AI总结 提出结合自蒸馏与强化学习的框架,从预训练视频生成器中提取任务解决能力,无需配对任务视频,在基准测试中超越原始模型。

详情
AI中文摘要

预训练视频生成器是有前景的视觉世界模型,展现出涌现的任务解决能力;然而,它们对详细文本描述的依赖限制了其在规划和决策中的直接使用。现有方法要么将这种推理外包给语言或视觉-语言模型,要么依赖带有配对任务执行视频的监督微调,后者收集成本高且难以扩展。我们提出一个可扩展的框架,通过结合自蒸馏与强化学习来激发此类模型的任务解决能力。给定一张无标注场景图像,视觉-语言模型生成候选任务和详细的逐步解决方案。该解决方案条件化一个预训练视频扩散模型(演示者);我们将其行为蒸馏到一个仅以图像和简短任务提示为条件的执行者中。这将执行知识从字幕引导生成转移到指令条件任务解决,无需精心策划的任务视频监督。我们进一步通过来自VLM反馈的强化学习改进执行者,利用判断采样视频是否满足任务与生成解决方案之间的不对称性。在我们提出的WorldTasks-Benchmark和DreamGen机器人基准上的实验表明,在我们基于VLM的评估协议下,执行者超越了演示者,并具有竞争力地迁移到机器人任务。

英文摘要

Pretrained video generators are promising visual world models that exhibit emergent task-solving abilities; however, their reliance on detailed textual descriptions limits their direct use for planning and decision-making. Existing approaches either outsource this reasoning to language or vision-language models, or rely on supervised fine-tuning with paired task-execution videos, which are costly to collect and difficult to scale. We propose a scalable framework that elicits task-solving ability in such models by combining self-distillation with reinforcement learning. Given an unlabeled scene image, a vision-language model generates a candidate task and a detailed step-by-step solution. The solution conditions a pretrained video diffusion model, the Demonstrator; we distill its behavior into an Executor conditioned only on the image and a short task prompt. This transfers execution knowledge from caption-guided generation to instruction-conditioned task solving without curated task-video supervision. We further improve the Executor with reinforcement learning from VLM feedback, exploiting the asymmetry between judging whether a sampled video satisfies a task and generating the solution. Experiments on our proposed WorldTasks-Benchmark and the DreamGen robotics benchmark show that the Executor surpasses the Demonstrator under our VLM-based evaluation protocol and transfers competitively to robotic tasks.