arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1942
2605.20802 2026-05-21 cs.AR cs.AI

ELSA: An ELastic SNN Inference Architecture for Efficient Neuromorphic Computing

ELSA: 一种用于高效神经形态计算的弹性SNN推理架构

Kang You, Chen Nie, Lee Jun Yan, Ziling Wei, Cheng Zou, Zekai Xu, Yu Feng, Honglan Jiang, Zhezhi He

AI总结 本文提出ELSA架构,通过细粒度的脊柱/令牌流水线和针对SNN的硬件优化,实现了真正的弹性推理,从而在保持准确性的同时显著降低延迟,实验表明SNN在保持精度的同时优于量化人工神经网络。

Comments 17 pages, Proceedings of the 53rd Annual International Symposium on Computer Architecture (ISCA), 2026

详情
AI中文摘要

脉冲神经网络(SNNs)利用事件驱动和仅加法计算显著提高智能计算的效率。SNNs的关键时间特性弹性推理允许输出逐步出现,使系统能够比完整评估更早响应显著输入。然而,现有的专门针对SNN的加速器无法利用这一特性。分层设计只有在所有层完成后才输出结果,而时间步-时间步设计依赖于粗粒度、分层的流水线,需要同步每一层内的所有脊柱/令牌。这一障碍阻止了结果的即时转发,延迟了最早的响应,并放弃了弹性推理的好处。为了解决这些挑战,我们提出了ELSA,一种接近SRAM的数据流架构,通过细粒度的脊柱/令牌流水线和针对SNN的硬件优化,实现了真正的弹性推理。ELSA在生成每个脊柱/令牌时立即转发结果,形成一个连续的流式管道,大幅降低了到第一个响应的延迟。为了增强这种轻量级执行,ELSA引入了捆绑地址事件表示协议以降低网络芯片(NoC)的通信流量,并利用迷你批次脉冲Gustavson乘积以减少内存访问并利用固有的稀疏性。结合映射和调度优化,ELSA实现了高效、事件驱动的计算,而无需牺牲准确性。实验表明,SNN在保持精度的同时可以优于量化人工神经网络(QANN)。对于4位ResNet-50,ELSA在SOTA QANN加速器(ANT)上实现了3.4倍的速度提升和13.6倍的能效提升,在SOTA SNN加速器(PAICORE)上实现了2.9倍的速度提升和22.1倍的能效提升。

英文摘要

Spiking neural networks (SNNs) exploit event-driven and addition-only computation to substantially improve efficiency for intelligent computation. A key temporal property of SNNs, elastic inference, allows outputs to emerge progressively, enabling responses to salient inputs much earlier than full evaluation. However, existing SNN-specific accelerators cannot capitalize on this property. Layer-by-layer designs emit outputs only after all layers are complete, while time-step-by-time-step designs rely on coarse-grained, layer-wise pipelines that require synchronizing all spines/tokens within a layer. This barrier prevents results from being forwarded immediately, delaying the earliest possible response and forfeiting the benefits of elastic inference. To address these challenges, we propose ELSA, a near-SRAM dataflow architecture that realizes true elastic inference through a fine-grained spine/token-wise pipeline and hardware optimizations tailored to SNNs. ELSA forwards each spine/token immediately upon production, forming a continuous streaming pipeline that substantially reduces the latency to the first response. To enhance this lightweight execution, ELSA introduces a bundled address event representation protocol to lower communication traffic of network-on-chip (NoC), and leverages mini-batch spiking Gustavson-product to cut memory access and exploit inherent sparsity. Combined with mapping and scheduling optimizations, ELSA achieves efficient, event-driven computation without compromising accuracy. Experiments show that SNNs can outperform quantized artificial neural networks (QANNs) while maintaining on-par accuracy. For a 4-bit ResNet-50, ELSA achieves 3.4$\times$ speedup and 13.6$\times$ higher energy efficiency over the SOTA QANN accelerator (ANT), and 2.9$\times$ speedup and 22.1$\times$ energy efficiency gains over the SOTA SNN accelerator (PAICORE).

2605.20799 2026-05-21 cs.DC cs.LG

Instant GPU Efficiency Visibility at Fleet Scale

在舰队规模上实现GPU效率的即时可见性

Connor Pedersen, Dong H. Ahn, Michel Migdal, Collin Neale, Nik Konyuchenko

AI总结 本文提出了一种硬件级别的GPU效率指标Overall FLOP Utilization (OFU),用于HPC系统上的AI工作负载,通过两种芯片级性能计数器:张量管道活动和SM时钟频率来推导。OFU无需应用程序仪器化,适用于所有GPU代际和数值精度。通过在H100和GB200上进行受控的GEMM实验,研究了OFU近似值的五个属性,并在FP16、TF32、FP8和NVFP4上进行了验证。经过瓷砖量化修正后,OFU可以预测应用级别的MFU误差不超过2个百分点。在608个生产训练任务中,OFU与应用级别的MFU相关性达到0.78,并揭示了两个框架级别的FLOPs计算错误。在大规模GPU舰队上部署后,OFU检测到2.5倍的效率退化,并追踪了混合精度预训练中的精度依赖性利用变化。评估和运营经验表明,OFU是应用级别MFU的实用且可部署的补充,用于持续的舰队级效率监控。

Comments 12 pages, 7 figures, 3 tables

详情
AI中文摘要

我们提出Overall FLOP Utilization (OFU),一种用于HPC系统上AI工作负载的硬件级别、不依赖精度的GPU效率度量标准,其基于两个芯片级性能计数器:张量管道活动和SM时钟频率。OFU不需要应用程序仪器化,并适用于所有GPU代际和数值精度。我们通过在H100和GB200上进行受控的GEMM实验,研究了OFU近似值的五个属性——瓷砖量化、浮点精度缩放、时钟采样噪声、张量核心时钟域和非张量低估——在FP16、TF32、FP8和NVFP4上进行验证。在瓷砖量化修正后,OFU预测应用级别的MFU误差不超过2个百分点。在608个生产训练任务中,OFU与应用级别的MFU相关性达到0.78,并揭示了两个框架级别的FLOPs计算错误。在大规模GPU舰队上部署后,OFU检测到2.5倍的效率退化,并追踪了混合精度预训练中的精度依赖性利用变化。我们的评估和运营经验表明,OFU是应用级别MFU的实用且可部署的补充,用于持续的舰队级效率监控。

英文摘要

We present Overall FLOP Utilization (OFU), a hardware-level, precision-agnostic GPU efficiency metric for AI workloads on HPC systems, derived from two on-chip performance counters: Tensor Pipe Activity and SM clock frequency. OFU requires no application instrumentation and works across GPU generations and numeric precisions. We characterize five properties of the OFU approximation -- tile quantization, floating-point precision scaling, clock sampling noise, Tensor Core clock domains, and non-tensor undercounting -- through controlled GEMM experiments on H100 and GB200 across FP16, TF32, FP8, and NVFP4. After tile-quantization correction, OFU predicts application-level MFU to within <=2 percentage points. Against 608 production training jobs, OFU achieves r = 0.78 correlation with application-level MFU and surfaces two framework-level FLOPs miscalculations. Deployed across large-scale GPU fleets, OFU has detected a 2.5x efficiency regression and tracked precision-dependent utilization changes in mixed-precision pretraining. Our evaluation and operational experience suggest OFU is a practical, deployment-ready complement to application-level MFU for continuous fleet-wide efficiency monitoring.

2605.20734 2026-05-21 cs.CR cs.AI

An Application-Layer Multi-Modal Covert-Channel Reference Monitor for LLM Agent Egress

面向LLM代理出站的应用层多模态隐通道参考监控器应用

Alfredo Metere

AI总结 本文提出了一种应用层多模态隐通道参考监控器,用于检测和防止LLM代理在消息中泄露数据,通过多阶段文本管道、媒体加密器和残余容量测量来实现对隐通道的监控和管理。

详情
AI中文摘要

一个发送消息的大型语言模型(LLM)代理可能会在消息中泄露数据。目标允许列表和内容扫描器无法检测到一个看似无害的负载本身是否是一个隐通道:被篡改的代理在零宽度字符、同形字符、空格、base64、JavaScript对象表示法(JSON)键顺序、消息时间或大小中编码位,并在二进制出站中,在每个图像的最显著位平面、图像平均亮度、图像序列排列、超声波音调或可听频段音频化数据中进行编码。我们的出站参考监控器有三个贡献。 (i) 一个包含十个容量减少阶段的文本管道,一个针对每个终点的漏桶容量账本,以及一个分阶段的姿势,该姿势从第一天起强制执行无损阶段。 (ii) 两个媒体加密器(一个傅里叶域音频带限器和一个红绿蓝(RGB)图像位深度和平均亮度桶器),由启动时的加密合法性证明所控制:审计员在启动时发布受信任的Ed25519密钥和{kind, data-class}配对;只有具有授权类验证签名的负载才被豁免。证明绕过了真实媒体和作为载体音频化或栅格化的数据之间的基于内容的判别难题;未签名的媒体默认被怀疑;基于内容的标准化器关闭了图像排列通道。 (iii) 剩余容量是嵌入位和恢复位之间的Miller-Madow修正的互信息(零表示被破坏),通过一个包含十五个工作编码器的对抗性集合进行测量,这些编码器覆盖文本、图像和音频。参考实现将残余容量驱动到每个可破坏通道的零,并驱动无法破坏而不破坏图像的单一通道(每图像平均亮度)到一个规定界限。

英文摘要

A large language model (LLM) agent that sends messages can leak data inside them. Destination allowlists and content scanners do not police whether an otherwise-benign payload is itself a covert channel: a compromised agent encodes bits in zero-width characters, homoglyphs, whitespace, base64, JavaScript Object Notation (JSON) key ordering, message timing or size -- and, in binary egress, in least-significant-bit (LSB) pixel planes, per-image mean luminance, inter-image sequence permutation, ultrasonic tones, or audible-band sonified data. Our egress reference monitor has three contributions. (i) A text pipeline of ten capacity-reducing stages, a per-sink leaky-bucket capacity ledger, and a staged posture that enforces lossless stages from day one. (ii) Two media scramblers (a Fourier-domain audio band-limiter and a red-green-blue (RGB) image bit-depth and mean-luminance bucketer) gated by a boot-time cryptographic legitimacy attestation: an auditor publishes at boot the trusted Ed25519 keys and {kind, data-class} pairs; only payloads with a verifying signature for an authorized class are exempt. The attestation sidesteps the intractable content-based discrimination between real media and data sonified or rasterized as a carrier; unsigned media is suspect by default; a content-addressed canonicalizer closes the inter-image permutation channel. (iii) Residual capacity is the Miller--Madow corrected mutual information between embedded and recovered bits (zero when destroyed), measured by an adversarial ensemble of fifteen working encoders across text, image and audio. The reference implementation drives residual capacity to zero on every destroyable channel and to a stated bound on the one (per-image mean luminance) that cannot be destroyed without ruining the image.

2605.20717 2026-05-21 cs.NE cs.AR cs.CV eess.IV

E-ReCON: An Energy- and Resource-Efficient Precision-Configurable Sparse nvCIM Macro for Conventional and Spiking Neural Edge Inference

E-ReCON:一种能量和资源高效、精度可配置的稀疏nvCIM宏单元,用于传统和脉冲神经边缘推理

Ankit Kumar Tenwar, Mukul Lokhande, Santosh Kumar Vishvakarma

AI总结 本文提出了一种基于紧凑型3T1R ReRAM位单元的16 Kb能量和资源高效的数字计算在内存(DCIM)宏单元E-ReCON,用于传统和脉冲神经网络边缘推理,通过引入新型交错10T/28T加法器树,减少晶体管数量和功耗,同时在65 nm CMOS工艺下实现低延迟、高吞吐量和高能效,适用于多种神经网络模型。

详情
AI中文摘要

本工作提出E-ReCON,一种基于紧凑型3T1R ReRAM位单元的16 Kb能量和资源高效的数字计算在内存(DCIM)宏单元,用于边缘AI推理。所提出的位单元仅占用0.85 um^2,并支持可靠的基于AND的在内存乘法,适用于传统卷积神经网络(CNN)和脉冲神经网络(SNN)工作负载。为减少累积开销,引入了新型交错10T/28T加法器树,与传统28T RCA设计相比,晶体管数量和功耗分别减少了37%和28%。在65 nm CMOS工艺下,该宏单元实现了最小延迟0.48 ns,吞吐量2.31-3.1 TOPS,能量效率高达419 TOPS/W。在LeNet-5、AlexNet和CNN-8模型上评估时,分别在MNIST/A-Z、CIFAR10和SVHN数据集上实现了97.81%、93.23%和96.51%的准确率。此外,40%的剪枝保留了几乎99.8%的原始准确率,同时减少了MAC操作和计算周期。对于面向SNN的工作负载,所提出的AND型位单元高效支持脉冲-权重乘法,具有低开关活动,其中2A2W配置在CIFAR-10、CIFAR-100和ImageNet-1K数据集上,准确率接近FP32基线。与之前的ADC基于ReRAM-CIM设计相比,所提出的架构在保持全PVT和ReRAM变异性下,将延迟和能效提高了近30-40%。总体而言,E-ReCON提供了一种可扩展、低延迟、高能效的nvCIM平台,适用于下一代边缘AI、物联网、生物医学传感和神经形态应用。

英文摘要

This work presents E-ReCON, a 16 Kb energy and resource-efficient digital compute-in-memory (DCIM) macro based on a compact 3T1R ReRAM bitcell for edge-AI inference. The proposed bitcell occupies only 0.85 um^2 and supports reliable AND-based in-memory multiplication for both conventional convolutional neural network (CNN) and spiking neural network (SNN) workloads. To reduce accumulation overhead, a novel interleaved 10T/28T adder tree is introduced, reducing transistor count and power consumption by 37% and 28%, respectively, compared to a conventional 28T RCA-based design. Implemented in 65 nm CMOS at 1.2 V, the proposed macro achieves a minimum latency of 0.48 ns, throughput of 2.31-3.1 TOPS, and energy efficiency of up to 419 TOPS/W. When evaluated on LeNet-5, AlexNet, and CNN-8 models, the macro achieves 97.81%, 93.23%, and 96.51% accuracy on MNIST/A-Z, CIFAR10, and SVHN datasets, respectively. In addition, 40% pruning preserves nearly 99.8% of the original accuracy while reducing MAC operations and computation cycles. For SNN-oriented workloads, the proposed AND-type bitcell efficiently supports spike-weight multiplication with low switching activity, where the 2A2W configuration achieves accuracy close to the FP32 baseline across VGG-8, VGG-16, and ResNet-18 networks on CIFAR-10, CIFAR-100, and ImageNet-1K datasets. Compared to prior ADC-based ReRAM-CIM designs, the proposed architecture improves latency and energy efficiency by nearly 30-40% while maintaining robust operation under full PVT and ReRAM variability. Overall, E-ReCON provides a scalable, low-latency, and energy-efficient nvCIM platform for next-generation edge-AI, IoT, biomedical sensing, and neuromorphic applications.

2605.20704 2026-05-21 cs.CR cs.AI cs.MA

Heartbeat-Bound Hierarchical Credentials: Cryptographic Revocation for AI Agent Swarms

基于心跳的分层凭证:面向AI代理群的加密撤销机制

Saurabh Deochake

AI总结 本文提出了一种基于心跳的分层凭证协议,通过将凭证有效性与周期性父级存活证明绑定,实现无需网络连接的高效凭证撤销,显著减少了僵尸代理的存活时间并提升了验证效率。

详情
AI中文摘要

自主AI代理生成子代理群体时会产生安全漏洞:现有凭证撤销机制,如OAuth 2.0 introspection、OCSP和W3C Status Lists,需要连接中央权威,导致操作员关闭后,僵尸代理可能持续执行特权操作数分钟至数小时。本文提出Heartbeat-Bound Hierarchical Credentials (HBHC),一种将凭证有效性与周期性父级存活证明绑定的加密协议。验证者仅需缓存的公钥和本地时钟即可强制凭证的新鲜度,无需网络往返。当心跳生成停止时,所有后代凭证在确定性时间内$W_z \le W_{\max} + Δ_h + ε$内不可用,前提是时钟偏差和父级密钥在安全 enclave 中被持有。在协议层和真实LLM支持的代理群(GPT-4o-mini)上的评估显示,与OAuth 2.0相比,僵尸窗口减少了90倍,Rust中完整认证时间为0.26毫秒,每秒可进行18,000+次验证,在并发HTTP负载下保持稳定,且验证延迟在10至10,000个代理范围内保持稳定。真实代理实验显示,工具调用的端到端开销为0.71%,在绕过应用层防护的提示注入攻击下,撤销后无工具调用,且在四层结构的49个代理中实现理论范围内的级联撤销。

英文摘要

Autonomous AI agents that spawn sub-agent swarms create a safety gap: existing credential revocation mechanisms, OAuth~2.0 introspection, OCSP, and W3C Status Lists, require network connectivity to a central authority, leaving ``zombie agents'' executing privileged operations for minutes to hours after operator shutdown. We present Heartbeat-Bound Hierarchical Credentials (HBHC), a cryptographic protocol that binds credential validity to periodic parent liveness proofs. Verifiers enforce freshness using only a cached public key and local clock; no network round-trip is required. When heartbeat generation ceases, all descendant credentials become unusable within a deterministically bounded window $W_z \le W_{\max} + Δ_h + ε$, conditional on bounded clock skew and parent keys held in secure enclaves. Evaluation at the protocol layer and with real LLM-backed agent swarms (GPT-4o-mini) demonstrates a 90$\times$ reduction in the zombie window over OAuth~2.0, 0.26~ms full authentication in Rust, 18,000+ verifications per second under concurrent HTTP load, and stable per-verification latency from 10 to 10,000 agents. Real-agent experiments show 0.71\% end-to-end overhead on tool calls, zero post-revocation tool calls under prompt injection that bypasses application-layer guardrails, and cascading revocation across a 49-agent four-level hierarchy within the theoretical bound.

2605.20687 2026-05-21 eess.IV cs.LG

Motion-Robust Deep Reconstruction for Free-Breathing Cardiac Cine MRI

运动鲁棒深度重建用于自由呼吸心脏 cine MRI

Mahmut Yurt, Kanghyun Ryu, Zhitao Li, Xucheng Zhu, Xianglun Mao, Martin Janich, Marcus Alley, Kawin Setsompop, John Pauly, Shreyas Vasanawala, Ali Syed

AI总结 本文提出Cine-DL框架,通过结合目标k空间预处理和快速模型基于深度重建,解决自由呼吸径向采集在高加速下的运动伪影问题,提升临床应用可行性。

详情
AI中文摘要

传统心脏 cine MRI 依赖于呼吸保持的 Cartesian 采集,容易产生运动伪影且可能不舒适或不可行,特别是对于儿童和其他不配合患者。自由呼吸径向采集可以缓解这些限制,但高加速下的鲁棒重建仍具挑战,因显著的条纹伪影。为解决这些限制,我们提出 Cine-DL,一个面向临床的框架,结合目标 k 空间预处理与快速、基于模型的深度重建。在该流程中,原始自由呼吸径向数据经过回顾性心脏分箱和呼吸门控以分辨心脏相位并丢弃运动损坏的 spokes。我们然后引入条纹优化线圈压缩 (SOC),明确保留心脏信号同时抑制通常驱动条纹伪影的外围干扰。所得 2D+t cine 系列通过一个展开网络重建,交替使用 ResNet 近似算子与基于物理的数据一致性更新,通过共轭梯度求解。我们进一步采用内存高效的训练策略以减少峰值内存使用。我们在自由呼吸志愿者数据上评估 Cine-DL,与已建立的基线 (k-t SENSE 和 iGRASP) 相比,并通过医院部署新获得的患者数据证明临床应用。我们的实验表明,Cine-DL 一致提高定量指标和视觉保真度,支持向自由呼吸 cine MRI 的常规、时间敏感临床应用的实用路线。

英文摘要

Conventional cardiac cine MRI relies on breath-hold Cartesian acquisitions, which are vulnerable to motion artifacts and can be uncomfortable or infeasible, particularly for pediatric and other noncompliant patients who cannot reliably hold their breath. Free-breathing radial acquisitions can alleviate these limitations, but robust reconstruction at high acceleration remains challenging due to prominent streak artifacts. To address these limitations, we propose Cine-DL, a clinically oriented framework that couples targeted k-space preprocessing with fast, model-based deep reconstruction. In this pipeline, raw free-breathing radial data undergo retrospective cardiac binning and respiratory gating to resolve cardiac phases and discard motion-corrupted spokes. We then introduce Streak Optimized Coil Compression (SOC), which explicitly preserves cardiac signals while suppressing peripheral interference that typically drives the streak artifacts. The resulting 2D+t cine series is reconstructed with an unrolled network that alternates a ResNet proximal operator with physics-based data consistency updates solved via conjugate gradient. We further employ a memory-efficient training strategy that reduces peak memory usage. We evaluate Cine-DL on free-breathing volunteer data against established baselines (k-t SENSE and iGRASP) and demonstrate clinical translation via hospital deployment on newly acquired patient data. Our experiments show that Cine-DL consistently improves quantitative metrics and visual fidelity, supporting a practical route toward routine, time-sensitive clinical adoption of free-breathing cine MRI.

2605.20681 2026-05-21 stat.ME cs.LG

Scale-Calibrated Median-of-Means for Robust Distributed Principal Component Analysis

基于尺度校准的中位数-均值方法用于鲁棒分布式主成分分析

Kisung You

AI总结 本文研究了基于尺度校准的中位数-均值估计器,用于鲁棒分布式主成分分析,通过欧几里得空间和格拉斯曼流形的产品几何结构,提出了一个节点级PCA展开,证明了所提出的产品流形中位数-均值估计器的渐近等价性,并展示了鲁棒块尺度和推断最优校准规则,以及高概率中位数-均值界限。

详情
AI中文摘要

分布式主成分分析(PCA)产生节点级的均值向量和主子空间估计。稳健地聚合这些异质对象需要均值误差和子空间误差之间的相对尺度。我们研究了使用欧几里得空间和格拉斯曼流形的产品几何结构的尺度校准的中位数-均值估计器用于此问题。一个节点级PCA展开显示,均值组件具有通常的线性影响,而子空间组件是特征间隙加权的协方差扰动。我们证明了一个局部减少,显示所提出的产品流形中位数-均值估计器在渐近上等价于一个缩放后的节点影响误差的空间中位数。这导致了固定节点非高斯极限、增长节点高斯极限和有限块偏差的高斯极限,以及显式依赖于尺度的协方差公式。我们提出了鲁棒块尺度和推断最优校准规则,建立了高概率中位数-均值界限,刻画了因子wise坏节点影响,并证明了节点Bootstrap有效性。模拟和大规模单细胞RNA-seq数据表明,尺度校准适应于特征间隙驱动的子空间不确定性,并提供了鲁棒的分布式PCA总结。

英文摘要

Distributed principal component analysis (PCA) produces node-level estimates of both a mean vector and a principal subspace. Robustly aggregating these heterogeneous objects requires a relative scale between mean error and subspace error. We study a scale-calibrated median-of-means estimator for this problem using the product geometry of Euclidean space and the Grassmann manifold. A node-level PCA expansion shows that the mean component has the usual linear influence, whereas the subspace component is an eigengap-weighted covariance perturbation. We prove a local reduction showing that the proposed product-manifold median-of-means estimator is asymptotically equivalent to a scaled spatial median of node influence errors. This yields fixed-node non-Gaussian limits, growing-node Gaussian limits with finite-block bias, and an explicit scale-dependent covariance formula. We propose robust block-scale and inference-optimal calibration rules, establish high-probability median-of-means bounds, characterize factorwise bad-node influence, and prove node-bootstrap validity. Simulations and large-scale single-cell RNA-seq data show that scale calibration adapts to eigengap-driven subspace uncertainty and provides a robust distributed PCA summary.

2605.20649 2026-05-21 eess.SP cs.AI cs.LG

AMAR: Lightweight Attention-Based Multi-User Activity Recognition from Wi-Fi CSI

AMAR: 基于注意力机制的轻量级多用户活动识别从Wi-Fi CSI

Amirhossein Mohammadi, Hina Tabassum

AI总结 本文提出了一种基于注意力机制的轻量级多用户活动识别框架AMAR,通过将活动识别转化为集合预测问题,利用Transformer架构和边缘-云混合架构,实现了在多用户环境下对并发活动的高精度识别,同时显著减少带宽使用和占用估计误差。

Comments 25 pages, 6 figures, 3 tables

详情
AI中文摘要

基于Wi-Fi的人体活动识别(HAR)已发展为一种有前景的无接触传感方法,利用无线收发器收集的信道状态信息(CSI)。尽管现有研究主要集中在单用户场景,但实际部署通常涉及多用户设置,其中并发用户的行为导致CSI模式重叠,挑战传统分类方法。为解决这一限制,本文提出了一种基于注意力机制的多用户活动识别(AMAR)框架,将HAR转化为集合预测问题。AMAR的Transformer架构利用可学习的查询嵌入作为专用活动检测器,使系统能够同时从复合CSI表示中识别多种活动。此外,为应对部署限制,AMAR采用边缘-云混合架构,其中边缘设备上的轻量级卷积网络执行初始特征提取,随后通过残差向量量化实现显著的带宽减少,同时保留活动区分信息。云组件通过基于注意力的集合匹配执行最终活动预测,使系统能够处理变化的占用水平。在教室、会议厅和空房间环境中,AMAR在平均情况下几乎将完美预测所有并发活动的速率提高了两倍,同时其F1分数达到53.4%,比最佳基准45.6%有所提高,并将占用估计误差减少了74%,同时大幅减少带宽使用。

英文摘要

Wi-Fi-based human activity recognition (HAR) has emerged as a promising approach for contactless sensing, leveraging channel state information (CSI) collected from wireless transceivers. While existing studies have primarily concentrated on single-user scenarios, real-world deployments often involve multi-user settings where concurrent users' movements induce overlapping CSI patterns that challenge conventional classification methods. To address this limitation, this paper introduces an attention-based multi-user activity recognition (AMAR) framework that formulates HAR as a set prediction problem. The transformer-based architecture in AMAR leverages learnable query embeddings acting as specialized activity detectors, enabling the simultaneous identification of multiple activities from composite CSI representations. Moreover, to address deployment constraints, AMAR is designed in an edge-cloud split architecture form where lightweight convolutional networks on edge devices perform initial feature extraction, followed by residual vector quantization that achieves substantial bandwidth reduction while preserving activity-discriminative information. The cloud component performs final activity prediction through attention-based set matching, enabling the system to handle varying occupancy levels. Across classroom, meeting-room, and empty-room environments, on average AMAR nearly doubles the rate of perfectly predicting all concurrent activities compared to the best baseline. Moreover, it achieves an $F_1$-score of 53.4% compared to 45.6% for the best benchmark, and reduces occupancy estimation error by 74%, while minimizing bandwidth substantially.

2605.20641 2026-05-21 cs.CR cs.AI cs.LG

Trusted Weights, Treacherous Optimizations? Optimization-Triggered Backdoor Attacks on LLMs

可信的权重,危险的优化?针对大语言模型的优化触发后门攻击

Yifei Wang, Tianlin Li, Xiaohan Zhang, Yida Yang, Xiaoyu Zhang, Li Pan

AI总结 本文提出了一种利用编译优化过程植入隐蔽后门的攻击方法,通过两种互补策略在无需修改编译器或硬件的情况下,实现对大语言模型的后门攻击,并展示了其在多个开源大语言模型上的高成功率。

Comments 20 pages, 3 figures

详情
AI中文摘要

推理优化是部署大规模语言模型(LLMs)的关键技术。编译是LLMs中最广泛采用的优化技术。尽管编译假设原始图与编译图之间具有语义等价性,但我们首先揭示其数值副作用可以被恶意利用以在LLMs中植入隐蔽的后门。我们提出了一种包含两种互补策略的统一优化触发攻击框架。在不修改编译器或硬件的情况下,一种策略仅在模型编译时翻转特定输入的预测,而另一种策略使用一个通用触发器,在未编译执行时保持静默,但在应用编译优化时劫持任意输入。这两种攻击都能绕过在没有编译时运行的标准安全评估。我们实证表明,这些优化触发后门在四个主流开源LLMs和四个任务上实现了平均90%的攻击成功率,同时在所有设置下保持几乎100%的干净准确性。我们的发现揭示了优化与安全在LLM部署流程交集处的新攻击面,并探讨了减轻此威胁的实用防御方法。

英文摘要

Inference optimization is a vital technique for deploying LLMs at scale. Compilation is the most widely adopted optimization technique for LLMs. While it assumes semantic equivalence between the original and compiled graphs, we first uncover its numerical side effects can be maliciously exploited to implant stealthy backdoors in LLMs. We propose a unified optimization-triggered attack framework comprising two complementary strategies. Without any modification to the compiler or hardware, one strategy flips predictions for specific inputs only when the model is compiled, while the other uses a universal trigger that remains dormant under uncompiled execution but hijacks arbitrary inputs once compilation optimization is applied. Both attacks bypass standard safety evaluations run without compilation. We empirically demonstrate that these optimization-triggered backdoors achieve attack success rates averaging 90% across four mainstream open-source LLMs and four tasks, while clean accuracy is preserved at nearly 100% under all settings. Our findings reveal a novel attack surface at the intersection of optimization and security in the LLM deployment pipeline, and we investigate practical defenses to mitigate this threat.

2605.20639 2026-05-21 math.OC cs.LG math.DS

Time-Dependent PDE-Constrained Optimization via Weak-Form Latent Dynamics

通过弱形式潜变量动力学进行时间依赖的PDE约束优化

April Tran, Terry Haut, David Bortz, Youngsoo Choi

AI总结 本文提出了一种基于弱形式潜空间降阶建模的框架,用于加速梯度基PDE约束优化,通过弱形式系统识别方法压缩高维解轨迹并识别参数化潜变量动力学,从而在多查询设计和控制场景中实现高效优化。

详情
AI中文摘要

受高维时间依赖偏微分方程约束的优化问题需要重复的正向和灵敏度求解,这在许多多查询设计和控制设置中使高保真优化计算上不可行。我们提出了一种弱形式潜空间降阶建模框架,用于加速梯度基PDE约束优化。所提出的方法基于弱形式潜空间动力学识别(WLaSDI),该方法将高维解轨迹压缩到低维潜变量表示中,并利用弱形式系统识别来识别参数化潜变量动力学。通过避免显式数值微分训练轨迹,弱形式提高了对噪声数据的鲁棒性,并产生了更可靠的代理动力学用于优化。我们制定了由此产生的降阶PDE约束优化问题,并推导了针对所学潜变量动力学的直接灵敏度和伴随基梯度表达式,从而能够以可扩展的方式对设计参数进行梯度评估。该框架在三个时间依赖的基准问题上得到验证:用于最优hohlraum设计的热辐射传递、两流不稳定性Vlasov-Poisson系统以及无粘Burgers方程。在这些例子中,WLaSDI产生了准确的最优设计,保持了在噪声训练数据下的鲁棒性,并实现了显著的计算节省,包括相对于全阶优化的速度提升高达五量级。这些结果表明,弱形式潜变量动力学为复杂时间依赖PDE系统的梯度基优化提供了高效且噪声鲁棒的代理基础。

英文摘要

Optimization problems constrained by high-dimensional, time-dependent partial differential equations require repeated forward and sensitivity solves, making high-fidelity optimization computationally prohibitive in many-query design and control settings. We present a weak-form latent-space reduced-order modeling framework for accelerating gradient-based PDE-constrained optimization. The proposed approach builds on Weak-form Latent Space Dynamics Identification (WLaSDI), which compresses high-dimensional solution trajectories into a low-dimensional latent representation and identifies parametric latent dynamics using weak-form system identification. By avoiding explicit numerical differentiation of training trajectories, the weak-form improves robustness to noisy data and yields more reliable surrogate dynamics for optimization. We formulate the resulting reduced PDE-constrained optimization problem and derive both direct-sensitivity and adjoint-based gradient expressions for the learned latent dynamics, enabling scalable gradient evaluation with respect to design parameters. The framework is demonstrated on three time-dependent benchmark problems: thermal radiative transfer for optimal hohlraum design, the two-stream instability Vlasov-Poisson system, and the inviscid Burgers equation. Across these examples, WLaSDI produces accurate optimal designs, remains robust under noisy training data, and delivers substantial computational savings, including speedups of up to five orders of magnitude relative to full-order optimization. These results demonstrate that weak-form latent dynamics provide an efficient and noise-robust surrogate foundation for gradient-based optimization of complex time-dependent PDE systems.

2605.20625 2026-05-21 eess.SY cs.MA cs.RO cs.SY

Time-To-Reach Separation and Safety Filtering for Safe, Fair, and Efficient Multi-Agent Coordination

时间到达分离与安全过滤用于安全、公平和高效的多智能体协调

Matthew Low, Jasmine Jerry Aloor, Victoria Marie Tuck, Pierluigi Nuzzo, Jason J. Choi

AI总结 本文提出了一种多智能体协调框架,利用最小时间到达(TTR)作为统一指标用于优先级分配、时间分离和安全过滤,以协调多个空中车辆进入空中走廊并保持车辆间安全分离。

Comments 9 pages, 3 figures. Extended version (including appendix) of a paper submitted to the 65th IEEE Conf. on Decision and Control (2026)

详情
AI中文摘要

先进空中交通(AAM)操作预计会显著增加城市空域的空中交通,需要自主交通管理系统确保在高度拥堵环境中实现无碰撞操作。本文提出了一种多智能体协调框架,利用最小时间到达(TTR)作为统一指标用于优先级分配、时间分离和安全过滤。我们专注于协调多个空中车辆进入空中走廊的问题,同时保持车辆间的安全分离。车辆根据TTR分配到达一致的优先级,目标TTR值用于强制时间间隔,从而诱导空间分离。基于Hamilton-Jacobi可达性值函数的优先级一致的安全过滤层确保碰撞避免,同时最小化对参考引导的修改。在高度拥堵的走廊合并场景中的仿真结果表明,所提出的方法在安全、公平和效率方面优于时间最优引导和无优先级安全过滤。

英文摘要

Advanced Air Mobility (AAM) operations are expected to significantly increase aerial traffic in urban airspace, requiring autonomous traffic management systems to ensure collision-free operations in highly congested environments. In this paper, we propose a multi-agent coordination framework that uses minimum time-to-reach (TTR) as a unifying metric for priority assignment, temporal separation, and safety filtering. We focus on the problem of coordinating multiple aerial vehicles merging into an air corridor while maintaining safe separation between vehicles. Vehicles are assigned arrival-consistent priority based on TTR, and target TTR values are used to enforce temporal spacing that induces spatial separation. A priority-consistent safety filtering layer based on Hamilton-Jacobi reachability value functions ensures collision avoidance while minimally modifying the reference guidance. Simulation results in a highly congested corridor merging scenario show that the proposed method improves safety, fairness, and efficiency compared to time-optimal guidance and priority-agnostic safety filtering.

2605.20623 2026-05-21 math.AP cs.AI

Lower Bounds for Advection-Diffusion Equations: An Exploration with AI-Generated Proofs

关于对流-扩散方程的下界:与AI生成证明的探索

Chenyang An, Xiaoqian Xu

AI总结 本文通过AI生成的证明方法,建立了对流-扩散方程在三种不同情形下的显式下界,包括无粘性剪切的多项式$\dot H^{-1}$界、扩散剪切的混合尺度正下界以及快速振荡时间周期性流动的指数$L^2$界。

Comments 63 pages

详情
AI中文摘要

我们建立了对流-扩散方程在三种情形下的显式下界:对于无粘性剪切,$u\in L^\infty_t W^{1,1}_y$的多项式$\dot H^{-1}$界;对于扩散剪切,混合尺度的均匀正下界;以及对于快速振荡时间周期性流动的指数$L^2$界。所有常数都显式地依赖于数据。证明完全由多智能体数学证明系统QED生成,无需专家人类干预,作为测试AI生成严谨数学能力的检验。

英文摘要

We establish explicit lower bounds for advection-diffusion equations in three settings: a polynomial $\dot H^{-1}$ bound for inviscid shears with $u\in L^\infty_t W^{1,1}_y$, a uniform positive lower bound on the mixing scale for diffusive shears, and an exponential $L^2$ bound for rapidly oscillating time-periodic flows. All constants are explicit in the data. The proofs were generated entirely by a multi-agent math proving system, QED, without expert human intervention, serving as a test of AI's capability to produce rigorous mathematics.

2605.20563 2026-05-21 cs.MA cs.AI cs.CL cs.LG cs.SE

Multi-agent Collaboration with State Management

具有状态管理的多智能体协作

Mengyang Liu, Taozhi Chen, Zhenhua Xu, Xue Jiang, Yihong Dong

AI总结 本文提出STORM,一种面向多智能体协作的状态管理方法,通过在共享工作区中调解智能体的交互,确保每个智能体在一致的代码库视图上操作,并在写入时检测和解决冲突。STORM在多个LLM上优于基于git-worktree的多智能体基线,且在成本效率上具有竞争力,表明显式状态管理比工作区隔离更有效。

详情
AI中文摘要

近年来,多智能体系统在解决复杂任务方面展现出巨大潜力。然而,当多个智能体同时编辑共享代码库时,他们的更改可能会产生冲突,不一致的视图会导致集成失败。现有的多智能体系统通过工作区隔离(例如每个智能体一个git工作树)来解决这个问题,但这种方法将冲突解决推迟到事后合并步骤,恢复成本较高。在本文中,我们提出了STORM,即面向多智能体协作的状态管理(STate-ORiented Management)。具体而言,STORM通过调解智能体与共享工作区的交互来管理智能体状态,确保每个智能体都在代码库的一致视图上操作,并在写入时检测和解决冲突。我们评估了STORM在Commit0和PaperBench多个LLM上的表现。STORM在Commit0-Lite上比基于git-worktree的多智能体基线高出18.7%,在PaperBench上高出1.4%,同时在成本效率上具有竞争力或更好。结合单智能体运行,STORM在两个基准测试中分别达到87.6和78.2的最高分数,表明显式状态管理比工作区隔离更有效作为多智能体协作的基础。STORM也可以无缝地集成到任何多智能体系统中。

英文摘要

Recent advances in multi-agent systems have shown great potential for solving complex tasks. However, when multiple agents edit a shared codebase concurrently, their changes can silently conflict and inconsistent views lead to integration failures. Existing multi-agent systems address this through workspace isolation (e.g., one git worktree per agent), but this defers conflict resolution to a post-hoc merge step where recovery is expensive. In this paper, we propose STORM, i.e., STate-ORiented Management for multi-agent collaboration. Specifically, STORM manages agent states by mediating their interactions with the shared workspace, ensuring that each agent operates on a consistent view of the codebase and that conflicting edits are detected and resolved at write time. We evaluate STORM on Commit0 and PaperBench across multiple LLMs. STORM outperforms the git-worktree-based multi-agent baseline by +18.7 on Commit0-Lite and +1.4 on PaperBench, while achieving comparable or better cost efficiency. Combined with single-agent runs, STORM reaches highest scores of 87.6 and 78.2 on the two benchmarks respectively, suggesting that explicit state management is a more effective foundation for multi-agent collaboration than workspace isolation. STORM can also be plugged into any multi-agent system seamlessly.

2605.20559 2026-05-21 stat.ML cs.LG stat.AP stat.ME

Group-Aware Matrix Estimation and Latent Subspace Recovery

基于群体的矩阵估计与潜在子空间恢复

Hamza Golubovic, Matthew Shen, Genevera I. Allen, Tarek M. Zikry

AI总结 本文提出了一种针对异质数据中群体特定低秩矩阵估计的凸估计器GAME,通过重叠核范数惩罚正则化来恢复子群特定的子空间结构,同时在共享坐标系中保留局部潜在结构,并在不同数据集上验证了其在结构缺失情况下优于传统低秩方法的性能。

Comments 12 pages, 6 main figures, 1 main algorithm

详情
AI中文摘要

现代矩阵补全问题通常涉及异质数据,其行同时属于多个元类别,如推荐系统中的人口统计数据和年龄组,或神经电生理实验中的区域和记录会话标签。标准低秩估计器施加单一全局潜在几何结构,可以恢复平均结构,但可能平滑掉子群特定的变异,尤其是在观察分布不均的情况下。我们引入了Group-Aware Matrix Estimation (GAME),一种用于重叠子群级低秩矩阵估计的凸估计器。GAME通过重叠核范数惩罚正则化子群特定的子矩阵,允许相关组之间共享信息,同时在共享坐标系中保留局部潜在结构。我们为重建误差和子群特定子空间恢复提供了有限样本保证,展示了性能如何依赖于采样密度、子群秩和重叠结构。在合成、推荐、生态和神经科学数据集上的实验表明,GAME在结构缺失情况下最有益,其中子群意识正则化提高了重建准确性和潜在子空间保真度。在这些基准测试中,GAME在全局低秩、侧信息和现代填补基线中表现竞争力或最佳,当子群表现出不同低秩结构时,收益最大。

英文摘要

Modern matrix completion problems often involve heterogeneous data whose rows simultaneously belong to many meta-categories, such as demographic and age groups in recommendation systems, or region and recording session labels in neural electrophysiological experiments. Standard low-rank estimators impose a single global latent geometry, which can recover average structure but may smooth away subgroup-specific variation, especially when observations are unevenly distributed across groups. We introduce Group-Aware Matrix Estimation (GAME), a convex estimator for overlapping subgroup-wise low-rank matrix estimation. GAME regularizes category-specific submatrices through overlapping nuclear-norm penalties, allowing related groups to borrow information while preserving local latent structure in a shared coordinate system. We provide finite-sample guarantees for both reconstruction error and subgroup-specific subspace recovery, showing how performance depends on sampling density, subgroup rank, and overlap structure. Experiments on synthetic, recommendation, ecological, and neuroscience datasets show that GAME is most beneficial in structured missingness regimes, where subgroup-aware regularization improves both reconstruction accuracy and latent subspace fidelity. Across these benchmarks, GAME is competitive or best among global low-rank, side-information, and modern imputation baselines, with the largest gains when subgroups exhibit distinct low-rank structure.

2605.20552 2026-05-21 stat.ML cs.LG

Spectral bandits for smooth graph functions with applications in recommender systems

图上平滑函数的谱带it问题及其在推荐系统中的应用

Tomáš Kocák, Michal Valko, Rémi Munos, Branislav Kveton, Shipra Agrawal

AI总结 本文研究了图上平滑函数的带it问题,提出了一种在推荐系统中有效学习用户偏好的方法,通过有效维度的定义和线性缩放的算法,实现了低悔的在线学习。

Comments Published at AAAI 2014 - SDMBD

详情
AI中文摘要

图上的平滑函数在流形和半监督学习中有广泛应用。本文研究了一个带it问题,其中臂的收益在图上是平滑的。该框架适用于涉及图的在线学习问题,如基于内容的推荐。在该问题中,每个推荐的项目是一个节点,其预期评分与其邻居相似。目标是推荐具有高预期评分的项目。我们旨在设计累积遗憾不随节点数量劣化的算法。特别是,我们引入了有效维度的概念,该概念在现实世界图中较小,并提出了两种算法,其规模与该维度线性相关。我们在现实世界的内容推荐问题上的实验表明,从仅几十个节点的评估中即可学习出对成千上万项目的良好用户偏好估计器。

英文摘要

Smooth functions on graphs have wide applications in manifold and semi-supervised learning. In this paper, we study a bandit problem where the payoffs of arms are smooth on a graph. This framework is suitable for solving online learning problems that involve graphs, such as content-based recommendation. In this problem, each recommended item is a node and its expected rating is similar to its neighbors. The goal is to recommend items that have high expected ratings. We aim for the algorithms where the cumulative regret would not scale poorly with the number of nodes. In particular, we introduce the notion of an effective dimension, which is small in real-world graphs, and propose two algorithms for solving our problem that scale linearly in this dimension. Our experiments on real-world content recommendation problem show that a good estimator of user preferences for thousands of items can be learned from just tens nodes evaluations.

2605.20545 2026-05-21 stat.ML cs.LG

Sample Complexity of Transfer Learning: An Optimal Transport Approach

迁移学习的样本复杂性:一种最优传输方法

Haoyang Cao, Xin Guo, Wenpin Tang, Guan Wang

AI总结 本文通过最优传输视角分析迁移学习的样本效率,发现当数据维度d大于3时,迁移学习的样本复杂性为O(m^{-(α+1)/d}),优于直接学习的O(m^{-p/d}),其中α表示数据分布的光滑度,p表示最优目标模型的光滑度。

详情
AI中文摘要

迁移学习是许多复杂结构的机器学习/AI模型,如大语言模型和生成式AI中的关键技术。迁移学习的本质是利用已解决的源任务知识来解决新目标任务,尤其是在后者训练数据样本量m较低时。本文严格分析了迁移学习在样本效率方面的潜在优势。具体而言,从最优传输视角出发,我们发现当数据维度d大于3时,迁移学习的样本复杂性为O(m^{-(α+1)/d}),其中α表示数据分布的光滑度,而直接学习的样本复杂性为O(m^{-p/d}),其中p表示最优目标模型的光滑度。我们的发现从理论上支持了当目标任务在一系列不太光滑的模型(即高度复杂的网络,可能使用非光滑激活函数)中优化时,迁移学习具有更好的样本效率。以图像分类为例,我们通过数值实验展示了迁移学习的样本效率,即在数据渴求的 regime 中,迁移学习可以显著提升模型性能。

英文摘要

Transfer learning is an essential technique for many machine learning/AI models of complex structures such as large language models and generative AI. The essence of transfer learning is to leverage knowledge from resolved source tasks for a new target task, especially when the sample size $m$ of the training data for the latter is low. In this work, we rigorously analyze the potential benefit of transfer learning in terms of sample efficiency. Specifically, taking an optimal transport viewpoint of transfer learning, we find that when the data dimension $d$ is higher than $3$, the sample complexity for transfer learning is $O(m^{-(α+1)/d})$, with $α$ indicating the smoothness of the data distribution, as opposed to the $O(m^{-p/d})$ sample complexity for direct learning with $p$ indicating the smoothness of the optimal target model. Our finding theoretically supports a better sample efficiency for transfer learning, when the target task is optimizing over a family of not-so-smooth models (i.e., highly complex networks with the possible use of non-smooth activation functions). Using image classification as an example, we numerically demonstrate the sample efficiency for transfer learning, that is, in the data hungry regime, the model performance can be significantly improved by transfer learning.

2605.20185 2026-05-21 cs.GR cs.CV

PiG-Avatar: Hierarchical Neural-Field-Guided Gaussian Avatars

PiG-Avatar:分层神经场引导的高斯虚拟人物

Julian Kaltheuner, Jan Spindler, Sina Kitz, Patrick Stotko, Reinhard Klein

AI总结 本文提出PiG-Avatar,通过使用参数化身体模型进行运动传输,将虚拟人物表示为受连续神经场约束的体积标准空间中的高斯分布,从而解耦了表示与模板拓扑,实现了对复杂衣物几何和分层表面的高保真重建。

详情
AI中文摘要

现有的高斯虚拟人物方法通常在身体模板表面上参数化几何,这将虚拟人物的表示空间与模板的变形空间纠缠在一起,限制了对分层、非身体和非刚性的衣物几何的捕捉。我们提出了PiG-Avatar,通过仅使用参数化身体模型进行运动传输,将虚拟人物表示为受连续神经场约束的体积标准空间中的高斯分布,从而解耦了表示与模板拓扑,避免了基于表面的参数化几何约束。通过3D重心锚点传输维持运动一致性,该方法引导运动而不限制几何,并允许锚点自由偏离模板表面,从而通过构造生成密集且稳定的时空表面对应关系。为了使这种无约束的公式可操作,我们引入了双层空间一致优化,结合Sobolev预条件的神经场更新与一种新的基于KNN的预条件化标准锚点几何。这些机制共同诱导了锚点密度的自组织:锚点迁移到高曲率、外观变化和非一致运动的区域,而无需显式启发式。结果,复杂的衣物几何和分层表面作为自然、高保真的输出出现。这种单一表示进一步支持多级细节的分层重建,粗略级别的监督通过共享场和耦合锚点图传播到更细的级别。在具有复杂衣物和具有挑战性的非刚性运动的已建立基准上,PiG-Avatar实现了最先进的渲染质量,对不完美的身体模型初始化具有鲁棒性,并且可以在所有细节级别上实时渲染。

英文摘要

Existing Gaussian avatar methods typically parameterize geometry on a body-template surface, which entangles the avatar's representation space with the template's deformation space and limits the capture of layered, off-body, and non-rigid clothing geometry. We present PiG-Avatar, which addresses this limitation by using the parametric body model solely for kinematic transport, while representing the avatar as Gaussians anchored in a volumetric canonical space governed by a continuous neural field. This decouples representation from template topology, avoiding the geometric constraints of surface-based parameterizations. Kinematic coherence is maintained through 3D barycentric anchor transport, which guides motion without constraining geometry and allows anchors to deviate freely from the template surface, yielding dense, stable temporal surface correspondences by construction. To make this unconstrained formulation tractable, we introduce dual-level spatially coherent optimization, combining Sobolev-preconditioned neural-field updates with a novel KNN-based preconditioning of canonical anchor geometry. Together, these mechanisms induce an emergent self-organization of anchor density: anchors migrate toward regions of high curvature, appearance variation, and non-coherent motion without explicit heuristics. As a result, complex clothing geometry and layered surfaces emerge as natural, high-fidelity outputs. This single representation further supports hierarchical reconstruction across multiple levels of detail, with coarse-level supervision propagating to finer levels through the shared field and coupled anchor graph. On established benchmarks featuring subjects with complex clothing and challenging non-rigid motion, PiG-Avatar achieves state-of-the-art rendering quality, generalizes robustly to imperfect body model initialization, and renders in real time across all detail levels.

2605.19278 2026-05-21 q-fin.PM cs.LG

Do Better Volatility Forecasts Lead to Better Portfolios? Evidence from Graph Neural Networks

波动率预测是否能带来更好的投资组合?图神经网络的实证证据

Rylan Wade

AI总结 本文研究图神经网络是否能提高实际波动率预测,并探讨这些预测是否能提升投资组合表现。通过2015-2025年间465只标普500股票的每周实际波动率数据,将异质自回归和长短期记忆基线模型与基于滚动相关性、行业和格兰杰因果图的图神经网络模型进行比较,包括和不包括宏观经济状态特征。实证发现,预测误差最小、横截面排名准确度最高、投资组合夏普比率最高的模型是三种不同的模型。预测准确性、排名质量与投资组合表现相关但不等同。只有当投资规则能利用其编码的横截面结构时,图波动率模型才具有价值。

详情
AI中文摘要

本文检验图神经网络是否能提高实际波动率预测,并探讨这些预测是否能提升投资组合表现。使用2015-2025年间465只标普500股票的每周实际波动率数据,将异质自回归和长短期记忆基线模型与基于滚动相关性、行业和格兰杰因果图的图神经网络模型进行比较,包括和不包括宏观经济状态特征。实证发现,预测误差最小、横截面排名准确度最高、投资组合夏普比率最高的模型是三种不同的模型。预测准确性、排名质量与投资组合表现相关但不等同。只有当投资规则能利用其编码的横截面结构时,图波动率模型才具有价值。

英文摘要

This paper tests whether graph neural networks improve realized volatility forecasts and whether those forecasts improve portfolio performance. Using weekly realized volatility for 465 S&P 500 equities from 2015-2025, Heterogeneous Autoregressive and Long Short-Term Memory baselines are compared against GraphSAGE models built on rolling correlation, sector, and Granger-causal graphs, with and without macro regime features. The empirical finding is that the model with the lowest forecast MSE, the model with the highest cross-sectional ranking accuracy, and the model with the highest portfolio Sharpe ratio are three different models. Forecast accuracy, ranking quality, and portfolio performance are related but not interchangeable objectives. Graph volatility models add value only when the portfolio rule can exploit the cross-sectional structure they encode.

2605.17164 2026-05-21 cs.DC cs.AI cs.LG cs.PL

Charon: A Unified and Fine-Grained Simulator for Large-Scale LLM Training and Inference

Charon:一种用于大规模大语言模型训练和推理的统一且细粒度模拟器

Mengtian Yang, Zhekun Zhang, Mingheng Wu, Jianwen Yan, Hanshi Sun, Li-wen Chang

AI总结 本文提出Charon模拟器,通过统一、模块化和细粒度的方法,准确预测大语言模型性能,实验显示其在不同模型和配置上具有高精度,预测误差低于5.35%,并在实际推理部署中发现提升系统吞吐量的配置,展示了其实际价值。

Comments Accepted by MLSys 2026

详情
AI中文摘要

在大规模大语言模型(LLM)训练和推理中,由于并行策略、系统优化和硬件配置的复杂设计空间,实现最优性能极具挑战性。准确且快速的性能模拟对于通过验证“假设”图进行优化努力和系统研究至关重要。为此,我们引入Charon,一种统一、模块化且细粒度的模拟器,以准确预测LLM性能。实验显示,Charon在不同模型和配置上均具有高精度,总体预测误差始终低于5.35%,甚至在使用大规模GPU集群进行训练时也低于3.74%。在实际推理部署案例中,Charon发现了一种比工程调优基线配置提升系统吞吐量的配置,证明了其在现实中的重要价值。

英文摘要

Deploying large-scale LLM training and inference with optimal performance is exceptionally challenging due to a complex design space of parallelism strategies, system optimizations, and hardware configurations. Accurate and rapid performance simulation is critical for guiding optimization efforts and system studies by validating "what-if" Hooker Figure hypotheses. To address this, we introduce Charon, a unified, modular, and fine-grained simulator for accurately predicting LLM performance. Experiments show Charon achieves high accuracy across different models and configurations, with an overall prediction error consistently under 5.35%, and even under 3.74% for training with a large-scale GPU cluster. In a practical inference deployment case, Charon discovered a configuration that improved system throughput over an engineering-tuned baseline, demonstrating its significant real-world value.

2605.16428 2026-05-21 cs.IR cs.AI

The Impact of AI Search on the Online Content Ecosystem: Evidence from Google and Reddit

人工智能搜索对在线内容生态系统的影响:来自谷歌和推特的证据

Peibo Zhang, Ruomeng Cui, Dennis J. Zhang

AI总结 本文研究了人工智能搜索对在线内容生态系统的影响,通过谷歌AI概述和推特平台分析,发现AI概述提高了安全内容社区的参与度,但交互式AI模式削弱了这种效果。

详情
AI中文摘要

传统的搜索引擎通过将用户寻找信息的请求定向到外部网站来补充在线内容平台。生成式人工智能搜索工具能够直接在结果页面上总结答案,可能通过使访问来源平台变得可选而打破这种关系。我们利用谷歌AI概述和推特,其中一个最大的在线讨论平台,研究这一问题。我们的识别利用了谷歌的内容审核政策:安全的推特社区通过谷歌有机搜索被索引并在谷歌AI概述中出现,而不安全的社区虽然被有机搜索索引,但禁止在AI概述摘要中引用。使用差异-in-差异设计,我们发现AI概述提高了安全社区的参与度:每天的评论数量增加了12.0个百分点,评论用户数量增加了12.3个百分点,相对于不安全社区。这些影响集中在基于经验的讨论(意见、建议和个人经验)而不是基于事实的信息。然而,随后引入的谷歌AI模式,允许用户与AI摘要进行对话式交互,大大消除了经验内容中的这些收益。这些结果表明,人工智能搜索的效果在很大程度上取决于界面设计和内容类型。

英文摘要

Search engines traditionally complement online content platforms by directing users seeking information to external websites. The emergence of generative AI search tools that summarize answers directly on the results page may disrupt this relationship by making visits to source platforms optional. We study this question using Google AI Overviews and Reddit, one of the largest online discussion platforms. Our identification exploits Google's content moderation policy: Safe-for-Work (SFW) Reddit communities are indexed by Google organic search and surfaced in Google AI Overviews, while Not-Safe-for-Work (NSFW) communities, though indexed by organic search, are prohibited from being referenced in AI Overview summaries. Using a difference-in-differences design, we find that AI Overviews increase engagement in SFW communities: daily comments rise by 12.0 percent and the number of commenting users by 12.3 percent relative to NSFW communities. The effects are concentrated in experience-based discussions (opinions, advice, and personal experiences) rather than fact-based information. However, the subsequent introduction of Google AI Mode, which allows users to interact conversationally with the AI summary, largely eliminates these gains in experience-based content. These results suggest that the effects of AI search depend critically on interface design and types of content.

2605.15305 2026-05-21 cs.GR cs.LG

WorldParticle: Unified World Simulation of Lagrangian Particle Dynamics via Transformer

WorldParticle:通过Transformer实现拉格朗日粒子动力学的统一世界模拟

Caoliwen Wang, Minghao Guo, Siyuan Chen, Heng Zhang, Mengdi Wang, Xingyu Ni, Hanson Sun, Kunyi Wang, Zherong Pan, Kui Wu, Lingjie Liu, Yin Yang, Chenfanfu Jiang, Taku Komura, Wojciech Matusik, Peter Yichen Chen

AI总结 本文提出基于Transformer架构的粒子模拟器,能够统一模拟布料、弹性固体、牛顿流体、非牛顿流体、颗粒材料和分子动力学等不同物理现象,通过预测-校正设计和粒子表示,实现高效的模拟与泛化。

详情
AI中文摘要

一个能够模拟多种物理现象而无需针对特定求解器重新设计的统一模拟器一直是模拟科学中的长期目标。我们提出一个基于学习的粒子模拟器,基于单一的Transformer架构,以模拟布料、弹性固体、牛顿流体、非牛顿流体、颗粒材料和分子动力学。我们的模型采用共享拉格朗日粒子表示的预测-校正设计。一个显式预测器首先在已知的外力作用下推进粒子,产生一个中间状态,该状态捕捉了外部驱动的运动,但不捕捉粒子间相互作用。一个学习的校正器通过三个阶段预测残差位置和速度更新:一个粒子分词器编码局部粒子-粒子、粒子-边界和拓扑引导的相互作用;一个超分词编码器通过交替的自注意力和分词合并将粒子分词合并为紧凑的超分词集;一个超分词解码器通过交叉注意力将这些超分词提升回粒子分辨率,以预测每个粒子的位置和速度校正。逐步分词合并通过在每一层将分词数量减半来减少后续编码器层的注意力成本,解码器通过紧凑的超分词集而不是完整的粒子-粒子注意力进行通信。在六个动力学类别中,相同的架构能够泛化到未见过的材料、边界配置、初始条件和外力。我们进一步展示了下游交互控制、反向设计和从现实世界操作数据中学习,减少了对每个现象求解器工程的需要。

英文摘要

A unified simulator that can model diverse physical phenomena without solver-specific redesign is a long-standing goal across simulation science. We present a learning-based particle simulator built on a single transformer architecture to model cloth, elastic solds, Newtonian and non-Newtonian fluids, granular materials, and molecular dynamics. Our model follows a prediction-correction design on a shared Lagrangian particle representation. An explicit predictor first advances particles under the known external forces, producing an intermediate state that captures externally driven motion but not inter-particle interactions. A learned corrector then predicts the residual position and velocity updates through three stages: a particle tokenizer that encodes local particle-particle, particle-boundary, and topology-guided interactions; a super-token encoder that hierarchically merges particle tokens into a compact set of super tokens via alternating self-attention and token merging; and a super-token decoder that lifts these super tokens back to particle resolution through cross-attention to predict per-particle position and velocity corrections. Progressive token merging reduces the attention cost at successive encoder layers by halving the token count at each level, and the decoder communicates through the compact super-token set rather than full particle-to-particle attention. Across the six dynamics categories, the same architecture generalizes to unseen materials, boundary configurations, initial conditions, and external forces. We further demonstrate downstream interactive control, inverse design, and learning from real-world manipulation data, reducing the need for per-phenomenon solver engineering.

2605.08731 2026-05-21 cs.PF cs.LG

Single-Thread JPEG Decoder Benchmarks Mis-Evaluate ML Data Loaders

单线程JPEG解码器基准测试误评了ML数据加载器

Vladimir Iglovikov, Dmitry Kosarevsky

AI总结 本文通过评估不同Python可访问的JPEG解码路径在五种匹配的16核Google Cloud CPU上的表现,发现单线程基准测试无法准确评价ML数据加载器的性能,揭示了不同架构和解码器在多线程工作负载下的差异。

Comments 10 pages, 4 figures. Code and data: https://github.com/ternaus/imread_benchmark

详情
AI中文摘要

JPEG解码是常规的机器学习基础设施,但Python解码器的选择通常基于单进程、单线程的微观基准测试。我们通过在五种匹配的16核Google Cloud CPU(Intel Emerald Rapids,AMD Zen 4,AMD Zen 5,ARM Neoverse V2和ARM Neoverse N1)上审计十三种Python可访问的JPEG解码路径,验证了这一评估假设。ImageNet验证是工作负载,而不是新的数据集贡献:每次运行都从内存中解码完整的50,000张图像分割,并报告所有解码器的单线程吞吐量,对于符合条件的解码器,在工人数量{0,2,4,8}时报告PyTorch DataLoader吞吐量以及解码器跳过行为。评估协议改变了支持的结论。在Neoverse V2上,imageio在单线程吞吐量中排名第九,但进入与torchvision并列的DataLoader层级;在Zen 4上,torchvision从第七名的单线程提升到最高测量的DataLoader层级;在Neoverse N1上,imagecodecs是单线程领导者,但在峰值DataLoader吞吐量中排名第五。我们还发现Zen 4和Zen 5之间的工人数量结论不同,TensorFlow在单线程ARM上有较大的惩罚,严格的原生JPEG解码器/包装器拒绝了相同的罕见ImageNet JPEG。对于PyTorch DataLoader工作负载,torchvision和simplejpeg形成了最强的零跳过层级:torchvision具有最高的平均归一化吞吐量,而simplejpeg具有最高的最低吞吐量。OpenCV在每种测试的CPU上仍然是一个稳健的通用备用选项,超过平台本地胜者的90%。我们发布了原始JSON,生成的表格/图表以及一个可执行的本地/云基准框架。

英文摘要

JPEG decode is routine ML infrastructure, but Python decoder choices are often justified by single-process, single-thread microbenchmarks. We audit this evaluation assumption with thirteen Python-accessible JPEG decode paths on five matched 16 vCPU Google Cloud CPUs: Intel Emerald Rapids, AMD Zen 4, AMD Zen 5, ARM Neoverse V2, and ARM Neoverse N1. ImageNet validation is the workload, not a new dataset contribution: each run decodes the full 50,000-image split from memory and reports single-thread throughput for all decoders, PyTorch \texttt{DataLoader} throughput for eligible decoders at worker counts $\{0,2,4,8\}$, and decoder skip behavior. The evaluation protocol changes the supported conclusion. On Neoverse V2, \texttt{imageio} is ninth in single-thread throughput yet lands in the top DataLoader tier with \texttt{torchvision}; on Zen 4, \texttt{torchvision} rises from seventh single-thread to the top measured DataLoader tier; on Neoverse N1, \texttt{imagecodecs} is the single-thread leader but fifth at peak DataLoader throughput. We also find that worker-count conclusions differ between Zen 4 and Zen 5, TensorFlow has a large single-thread ARM penalty, and strict native JPEG decoders/wrappers reject the same rare ImageNet JPEG. For PyTorch DataLoader workloads, \texttt{torchvision} and \texttt{simplejpeg} form the strongest measured zero-skip tier: \texttt{torchvision} has the highest mean normalized throughput, while \texttt{simplejpeg} has the highest minimum. OpenCV remains a robust general-purpose fallback above 90\% of the platform-local winner on every tested CPU. We release raw JSON, generated tables/figures, and an executable local/cloud benchmark framework.

2605.04128 2026-05-21 cs.GR cs.AI cs.CL cs.CV cs.LG

JoyAI-Image: Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation

JoyAI-Image: 激活统一多模态理解和生成中的空间智能

Lin Song, Wenbo Li, Guoqing Ma, Wei Tang, Bo Wang, Yuan Zhang, Yijun Yang, Yicheng Xiao, Jianhui Liu, Yanbing Zhang, Guohui Zhang, Wenhu Zhang, Hang Xu, Nan Jiang, Xin Han, Haoze Sun, Maoquan Zhang, Haoyang Huang, Nan Duan

AI总结 本文提出JoyAI-Image,一种统一的多模态基础模型,用于视觉理解、文本到图像生成和指令引导的图像编辑。该模型结合了空间增强的多模态大语言模型(MLLM)和多模态扩散Transformer(MMDiT),通过共享的多模态接口实现感知与生成的交互。构建可扩展的训练配方,结合统一指令微调、长文本渲染监督、空间 grounded 数据和通用及空间编辑信号,使模型具备广泛的多模态能力,同时增强几何感知推理和可控视觉合成。实验表明,JoyAI-Image在理解、生成、长文本渲染和编辑基准上达到最先进的性能。更重要的是,增强的理解、可控的空间编辑和新视角辅助推理之间的双向循环使模型超越一般视觉能力,向更强的空间智能发展。

Comments Code: https://github.com/jd-opensource/JoyAI-Image

详情
AI中文摘要

我们提出了JoyAI-Image,一种统一的多模态基础模型,用于视觉理解、文本到图像生成和指令引导的图像编辑。JoyAI-Image将空间增强的多模态大语言模型(MLLM)与多模态扩散Transformer(MMDiT)结合,允许感知和生成通过共享的多模态接口进行交互。围绕此架构,我们构建了一个可扩展的训练配方,结合了统一指令微调、长文本渲染监督、空间 grounded 数据以及通用和空间编辑信号。该设计使模型具备广泛的多模态能力,同时增强了几何感知推理和可控视觉合成。在理解、生成、长文本渲染和编辑基准上的实验表明,JoyAI-Image实现了最先进的或高度竞争的性能。更重要的是,增强的理解、可控的空间编辑和新视角辅助推理之间的双向循环使模型超越一般视觉能力,向更强的空间智能发展。这些结果表明,统一视觉模型在下游应用如视觉-语言-动作系统和世界模型中具有前景。

英文摘要

We present JoyAI-Image, a unified multimodal foundation model for visual understanding, text-to-image generation, and instruction-guided image editing. JoyAI-Image couples a spatially enhanced Multimodal Large Language Model (MLLM) with a Multimodal Diffusion Transformer (MMDiT), allowing perception and generation to interact through a shared multimodal interface. Around this architecture, we build a scalable training recipe that combines unified instruction tuning, long-text rendering supervision, spatially grounded data, and both general and spatial editing signals. This design gives the model broad multimodal capability while strengthening geometry-aware reasoning and controllable visual synthesis. Experiments across understanding, generation, long-text rendering, and editing benchmarks show that JoyAI-Image achieves state-of-the-art or highly competitive performance. More importantly, the bidirectional loop between enhanced understanding, controllable spatial editing, and novel-view-assisted reasoning enables the model to move beyond general visual competence toward stronger spatial intelligence. These results suggest a promising path for unified visual models in downstream applications such as vision-language-action systems and world models.

2604.23937 2026-05-21 physics.flu-dyn cs.LG

Multi-scale Dynamic Wake Modeling and Prediction of Floating Offshore Wind Turbines via Physics-Informed Neural Networks and Fourier Neural Operators

基于物理信息神经网络和傅里叶神经算子的浮式海上风力涡轮机多尺度动态涡流建模与预测

Guodan Dong, Jianhua Qin, Chang Xu

AI总结 本文提出利用物理信息神经网络和傅里叶神经算子对浮式海上风力涡轮机的多尺度动态涡流进行建模与预测,通过高保真数据集验证了FNOs在效率、长期预测能力和多尺度相干结构保真度方面的优势。

详情
AI中文摘要

多尺度动态涡流建模与预测对于浮式海上风力涡轮机(FOWTs)的实时控制和优化至关重要。在本研究中,通过两种新的深度学习框架——物理信息神经网络(PINNs)和傅里叶神经算子(FNOs),对在不同斯特劳哈尔数(St)范围内耦合的涌动和俯仰运动下产生的涡流进行建模。高保真数据集来源于具有旋翼线模型的大涡模拟(LES-AL)。结果表明,两种框架都能很好地建模主导的大尺度动态结构,如涡流蜿蜒;然而,FNOs在效率(计算速度提升8倍,收敛速度提升40倍)、长期预测能力和多尺度相干结构保真度方面显著优于PINN模型。此外,PINN模型预测的涡流具有平滑效应,限制了高频相干结构的分辨率,并低估了涡流中心和半宽处的湍流波动。频谱分析显示,FNOs能解析主要的涡流蜿蜒频率(其中Stp表示由耦合涌动和俯仰运动引起的频率),其对应的高阶谐波(2Stp,3Stp)以及能量级联。相比之下,PINN预测中的能量级联在高频范围(St > 1.0)内衰减得更快。此外,预乘功率谱密度表明,PINN模型所建模的涡流蜿蜒及对应谐波频率的能量含量相对于CFD和FNOs而言相对较低。这些发现表明,FNOs在高保真、实时建模FOWT涡流方面具有广阔前景。

英文摘要

Multi-scale dynamic wake modeling and prediction are essential for the real-time control and optimization of floating offshore wind turbines (FOWTs). In this study, wakes of FOWTs under coupled surge and pitch motions across a range of Strouhal numbers (St), which can induce wake meandering, are modeled via two novel deep-learning frameworks: physics-informed neural networks (PINNs) and Fourier neural operators (FNOs). The high-fidelity dataset is obtained from large-eddy simulations with the actuator line model (LES-AL). The results demonstrate that the dominant large-scale dynamic structures, such as meandering, can be well modeled by both frameworks; however, FNOs exhibit significant advantages over the PINN model in terms of efficiency (8-fold computational speedup and 40-fold faster convergence), long-term predictive capability, and multi-scale coherent structural fidelity. Furthermore, the wakes predicted by the PINN model exhibit a smoothing effect that limits the resolution of high-frequency coherent structures and underestimates turbulent fluctuations in both the wake center and half-width. Spectral analysis reveals that FNOs resolve the primary meandering frequency (where Stp denotes the frequency induced by the coupled surge and pitch motions), its corresponding higher-order harmonics (2Stp, 3Stp), and the energy cascade. In contrast, the energy cascade in the PINN predictions dissipates more rapidly in the high-frequency regime (St > 1.0). Additionally, the pre-multiplied power spectral density indicates that the energy contained in meandering and the corresponding harmonic frequencies modeled by PINNs is relatively low compared to that in CFD and FNOs. These findings suggest that FNOs are promising for the high-fidelity, real-time modeling of FOWT wakes.

2603.28103 2026-05-21 cs.DL cs.AI cs.IR

Transcription and Recognition of Italian Parliamentary Speeches Using Vision-Language Models

使用视觉-语言模型进行意大利议会演讲的转录与识别

Luigi Curini, Alfio Ferrara, Giovanni Pagano, Sergio Picascia

AI总结 本文提出基于视觉-语言模型的 pipeline,用于自动转录、语义分割和实体链接意大利议会演讲,提升转录质量和发言者标注。

Comments to be published in: ParlaCLARIN V: Interoperability, Multilinguality, and Multimodality in Parliamentary Corpora, organized within the 15th Language Resource and Evaluation Conference (2026)

详情
AI中文摘要

议会记录代表了计算分析中丰富而具有挑战性的资源,特别是当仅保存为扫描的历史文档时。现有的意大利议会演讲转录努力依赖于传统的光学字符识别流水线,导致转录错误和有限的语义标注。在本文中,我们提出了一种基于视觉-语言模型的 pipeline,用于自动转录、语义分割和实体链接意大利议会演讲。该 pipeline 使用专门的 OCR 模型提取文本并保留阅读顺序,随后使用大规模的视觉-语言模型进行转录精修、元素分类和发言者识别,通过联合推理视觉布局和文本内容。提取的发言者随后通过 SPARQL 查询和多策略模糊匹配程序链接到议员委员会知识库。在已建立的基准测试中,评估显示在转录质量和发言者标注方面有显著改进。

英文摘要

Parliamentary proceedings represent a rich yet challenging resource for computational analysis, particularly when preserved only as scanned historical documents. Existing efforts to transcribe Italian parliamentary speeches have relied on traditional Optical Character Recognition pipelines, resulting in transcription errors and limited semantic annotation. In this paper, we propose a pipeline based on Vision-Language Models for the automatic transcription, semantic segmentation, and entity linking of Italian parliamentary speeches. The pipeline employs a specialised OCR model to extract text while preserving reading order, followed by a large-scale Vision-Language Model that performs transcription refinement, element classification, and speaker identification by jointly reasoning over visual layout and textual content. Extracted speakers are then linked to the Chamber of Deputies knowledge base through SPARQL queries and a multi-strategy fuzzy matching procedure. Evaluation against an established benchmark demonstrates substantial improvements both in transcription quality and speaker tagging.

2603.27309 2026-05-21 cs.GR cs.CV

MeshTailor: Cutting Seams via Generative Mesh Traversal

MeshTailor: 通过生成网格遍历进行剪裁缝线

Xueqi Ma, Xingguang Yan, Congyue Zhang, Hui Huang

AI总结 本文提出MeshTailor,一种首个基于网格的生成框架,用于在3D表面合成边缘对齐的缝线。与以往基于优化或外在学习的方法不同,MeshTailor直接在网格图上操作,消除了投影伪影和脆弱的 snapping 策略。我们引入了ChainingSeams,一种层次化的缝线图序列化,按从全局结构切割到局部细节的粗到细方式对链进行排序,并引入了双流编码器以融合拓扑和几何上下文。利用这种层次化表示和双流顶点嵌入,我们的MeshTailor Transformer 使用自回归指针层在局部邻域内逐顶点追踪缝线。广泛的评估表明,与最近的基于优化和学习的基线相比,MeshTailor生成的缝线布局更加连贯和结构规整。

详情
AI中文摘要

我们提出了MeshTailor,首个基于网格的生成框架,用于在3D表面上合成边缘对齐的缝线。与以往基于优化或外在学习的方法不同,MeshTailor直接在网格图上操作,消除了投影伪影和脆弱的 snapping 策略。我们引入ChainingSeams,一种层次化的缝线图序列化,按从全局结构切割到局部细节的粗到细方式对链进行排序,并引入了双流编码器以融合拓扑和几何上下文。利用这种层次化表示和双流顶点嵌入,我们的MeshTailor Transformer 使用自回归指针层在局部邻域内逐顶点追踪缝线。广泛的评估表明,与最近的基于优化和学习的基线相比,MeshTailor生成的缝线布局更加连贯和结构规整。

英文摘要

We present MeshTailor, the first mesh-native generative framework for synthesizing edge-aligned seams on 3D surfaces. Unlike prior optimization-based or extrinsic learning-based methods, MeshTailor operates directly on the mesh graph, eliminating projection artifacts and fragile snapping heuristics. We introduce ChainingSeams, a hierarchical serialization of the seam graph that orders chains from global structural cuts down to local details in a coarse-to-fine manner, and a dual-stream encoder that fuses topological and geometric context. Leveraging this hierarchical representation and dual-stream vertex embeddings, our MeshTailor Transformer utilizes an autoregressive pointer layer to trace seams vertex-by-vertex within local neighborhoods. Extensive evaluations show that MeshTailor produces more coherent and structurally regular seam layouts compared to recent optimization-based and learning-based baselines.

2603.25898 2026-05-21 eess.SY cs.AI cs.SE cs.SY

On Integrating Resilience and Human Oversight into LLM-Assisted Modeling Workflows for Digital Twins

在数字孪生构建中整合韧性与人类监督 into LLM辅助建模工作流

Lekshmi P, Neha Karanjkar

AI总结 本文提出三种关键设计原则,用于将韧性与人类监督整合到LLM辅助的数字孪生建模工作流中,通过FactoryFlow框架的研究,探讨了如何通过正交化结构建模与参数拟合、限制模型IR到参数化预验证库组件以及使用密度保持的IR来提高建模的鲁棒性和可解释性。

详情
AI中文摘要

LLM辅助建模有潜力快速从粗略描述和传感器数据构建复杂的可执行数字孪生。然而,LLM幻觉的韧性、人类监督以及实时模型适应性仍然是具有挑战性的且常常相互冲突的要求。我们提出了三种关键的设计原则,用于将韧性和监督整合到此类工作流中,这些原则源于我们在FactoryFlow框架上的工作,该框架是一个开源的LLM辅助框架,用于构建制造系统的基于模拟的数字孪生。首先,正交化结构建模和参数拟合。结构描述(组件、连接)是通过LLM从粗略的自然语言转换为中间表示(IR),并进行人工可视化和验证,然后算法转换为最终模型。相比之下,参数推断则在传感器数据流上持续运行,并具有专家可调的控制。第二,限制模型IR到参数化、预验证的库组件的连接,而不是单体模拟代码,从而实现可解释性和错误韧性。第三,最重要的是使用密度保持的IR。当IR描述从紧凑的输入急剧扩展时,幻觉错误会成比例累积。我们提出了Python作为密度保持IR的案例:循环以简洁的方式表达规律性,类捕捉层次结构和组成,结果仍然保持高度可读性,同时利用LLM强大的代码生成能力。一个关键贡献是详细表征了LLM诱导的错误在不同详细程度和复杂度的模型描述中的表现,揭示了IR选择如何关键地影响错误率。这些见解为构建鲁棒和透明的LLM辅助模拟自动化工作流提供了可操作的指导。

英文摘要

LLM-assisted modeling holds the potential to rapidly build executable Digital Twins of complex systems from only coarse descriptions and sensor data. However, resilience to LLM hallucination, human oversight, and real-time model adaptability remain challenging and often mutually conflicting requirements. We present three critical design principles for integrating resilience and oversight into such workflows, derived from insights gained through our work on FactoryFlow - an open-source LLM-assisted framework for building simulation-based Digital Twins of manufacturing systems. First, orthogonalize structural modeling and parameter fitting. Structural descriptions (components, interconnections) are LLM-translated from coarse natural language to an intermediate representation (IR) with human visualization and validation, which is algorithmically converted to the final model. Parameter inference, in contrast, operates continuously on sensor data streams with expert-tunable controls. Second, restrict the model IR to interconnections of parameterized, pre-validated library components rather than monolithic simulation code, enabling interpretability and error-resilience. Third, and most important, is to use a density-preserving IR. When IR descriptions expand dramatically from compact inputs hallucination errors accumulate proportionally. We present the case for Python as a density-preserving IR : loops express regularity compactly, classes capture hierarchy and composition, and the result remains highly readable while exploiting LLMs strong code generation capabilities. A key contribution is detailed characterization of LLM-induced errors across model descriptions of varying detail and complexity, revealing how IR choice critically impacts error rates. These insights provide actionable guidance for building resilient and transparent LLM-assisted simulation automation workflows.

2603.23890 2026-05-21 cs.SE cs.LG

Praxium: Diagnosing Cloud Anomalies with AI-based Telemetry and Dependency Analysis

Praxium:基于AI的遥测和依赖分析的云异常诊断

Rohan Kumar, Jason Li, Zongshun Zhang, Syed Mohammad Qasim, Gianluca Stringhini, Ayse K. Coskun

AI总结 本文提出Praxium框架,利用AI技术进行云服务异常检测和根本原因推断,通过遥测数据和依赖分析提高故障诊断效率和准确性。

详情
AI中文摘要

随着现代微服务架构在云应用中的普及,云服务正变得越来越复杂,更容易受到配置错误和软件bug的影响。传统方法依赖专家输入来诊断和修复微服务异常,但在持续集成和持续部署(CI/CD)范式下缺乏可扩展性。微服务发布包含新的软件安装,与应用程序组件有复杂的相互作用。因此,将异常行为归因于任何特定安装或发布变得更加困难,导致可能的解决时间变慢。为了解决当前诊断方法的不足,本文引入Praxium,一个用于异常检测和根本原因推断的框架。Praxium帮助管理员在软件发现工具PraxiPaaS提供的依赖安装信息的背景下评估目标指标性能。Praxium持续监控遥测数据以识别异常,然后通过最近软件安装的因果影响进行根本原因分析,以向站点可靠性工程师(SRE)提供有关观察到的异常的相关信息。在本文中,我们证明Praxium能够有效进行异常检测和根本原因推断,并提供在实际环境中所需的有效异常检测超参数调优分析。在75次总试验中使用四个合成异常,异常检测始终在>0.97宏F1水平上表现良好。此外,我们还显示因果影响分析能够可靠地推断异常的根本原因,即使软件包安装时间间隔越来越短。

英文摘要

As the modern microservice architecture for cloud applications grows in popularity, cloud services are becoming increasingly complex and more vulnerable to misconfiguration and software bugs. Traditional approaches rely on expert input to diagnose and fix microservice anomalies, which lacks scalability in the face of the continuous integration and continuous deployment (CI/CD) paradigm. Microservice rollouts, containing new software installations, have complex interactions with the components of an application. Consequently, this added difficulty in attributing anomalous behavior to any specific installation or rollout results in potentially slower resolution times. To address the gaps in current diagnostic methods, this paper introduces Praxium, a framework for anomaly detection and root cause inference. Praxium aids administrators in evaluating target metric performance in the context of dependency installation information provided by a software discovery tool, PraxiPaaS. Praxium continuously monitors telemetry data to identify anomalies, then conducts root cause analysis via causal impact on recent software installations, in order to provide site reliability engineers (SRE) relevant information about an observed anomaly. In this paper, we demonstrate that Praxium is capable of effective anomaly detection and root cause inference, and we provide an analysis on effective anomaly detection hyperparameter tuning as needed in a practical setting. Across 75 total trials using four synthetic anomalies, anomaly detection consistently performs at >0.97 macro-F1. In addition, we show that causal impact analysis reliably infers the correct root cause of anomalies, even as package installations occur at increasingly shorter intervals.

2603.21033 2026-05-21 cs.CE cs.LG

TabPFN Extensions for Interpretable Geotechnical Modelling

TabPFN扩展在可解释地质建模中的应用

Taiga Saito, Yu Otake, Daijiro Mizutani, Stephen Wu

AI总结 本文评估了TabPFN及其扩展库在地质任务中的表现,通过土壤类型分类和参数迭代填补,展示了TabPFN在不确定性量化和可解释性方面的优势。

详情
AI中文摘要

地质场地特性依赖于稀疏且异质的钻孔数据,其中不确定性量化和可解释性与预测准确性同样重要。我们评估了TabPFN以及其tabpfn-extensions库在两个地质任务中的表现:(1) 从N值和剪切波速度数据进行土壤类型分类作为受控示例;(2) 在BM/AirportSoilProperties/2/2025中迭代填补五个机械参数(s_u,E_u,σ'_p,C_c,C_v)。在不重新训练的情况下,我们应用余弦相似度分析TabPFN嵌入,可视化预测分布,并计算SHAP属性。在回归基准测试中,我们比较了TabPFN与均值填补、线性回归、随机森林、XGBoost和HBM;引入了预测不确定性在上下文扰动类中的代理分解;并通过一维固结模型传播边缘C_c和σ'_p分布以获得可靠性指数β和服务性超额概率P_f。嵌入表现出标签一致的黏土/砂分组;迭代填补减少了所有五个目标的RMSE,其中TabPFN在四个目标上最低;SHAP属性与Skempton压缩指数相关性和反向预固结压力-含水量依赖性一致;代理分解中的后验成分最大。我们将贡献定位为一个工作评估流程,可能补充数据稀缺的地质学方法,而不是算法创新。

英文摘要

Geotechnical site characterisation relies on sparse, heterogeneous borehole data, where uncertainty quantification and interpretability matter as much as predictive accuracy. We evaluate TabPFN~\citep{Hollmann2025}, a tabular foundation model, and its \texttt{tabpfn-extensions} library on two geotechnical tasks: (1) soil-type classification from N-value and shear-wave velocity data as a controlled illustrative case, and (2) iterative imputation of five mechanical parameters ($s_\mathrm{u}$, $E_{\mathrm{u}}$, ${σ'}_\mathrm{p}$, $C_\mathrm{c}$, $C_\mathrm{v}$) in BM/AirportSoilProperties/2/2025. Without retraining, we apply cosine-similarity analysis to TabPFN embeddings, visualise predictive distributions, and compute SHAP attributions. On the regression benchmark we compare TabPFN with mean imputation, linear regression, random forests, XGBoost, and HBM; introduce a proxy decomposition of predictive uncertainty across context-perturbation classes; and propagate marginal $C_\mathrm{c}$ and ${σ'}_\mathrm{p}$ distributions through a one-dimensional consolidation model to obtain the reliability index $β$ and serviceability exceedance probability $P_\mathrm{f}$. Embeddings exhibit label-consistent Clay/Sand grouping; iterative imputation reduces RMSE for all five targets, with TabPFN lowest on four; SHAP attributions are consistent with the Skempton compression-index correlation and the inverse preconsolidation-pressure-water-content dependence; the within-posterior component is largest in the proxy decomposition. We position the contribution as a worked evaluation workflow that may complement established methods for data-scarce geotechnics, not as algorithmic innovation.

2603.20420 2026-05-21 q-bio.GN cs.LG q-bio.QM

CRANE: Correcting Errors in Raw Nanopore Signals Using Hidden Markov Models

CRANE:利用隐马尔可夫模型纠正原始纳米孔信号中的错误

Simon Ambrozak, Ulysse McConnell, Bhargav Srinivasan, Burak Ozkan, Ernest Zhang, Can Firtina

AI总结 本文提出CRANE方法,通过训练和使用隐马尔可夫模型(HMM)来纠正纳米孔信号中的错误,从而提高原始信号分析的准确性,减少分析管道优化的负担,并且不引入显著的计算开销。

详情
AI中文摘要

纳米孔测序可以读取比其他测序方法更长的核酸分子序列,称为读数,这已推动了基因组分析的进步,如无间隙的人类基因组组装。通过分析纳米孔测序生成的原始电信号读数,现有方法可以将这些读数映射到DNA字符(即碱基调序)而无需转换,从而实现快速高效的测序数据分析。然而,原始信号常常由于噪声和处理误差而包含错误,这限制了原始信号分析的总体准确性。本文的目标是检测并纠正原始信号中的错误,以提高原始信号分析的准确性。为此,我们提出了CRANE,一种通过训练和利用隐马尔可夫模型(HMM)来准确纠正信号错误的机制。我们在各种数据集上的广泛评估表明,CRANE 1)一致提高了底层原始信号分析工具的整体准确性,2)最小化了为新型纳米孔技术优化分析管道的负担,3)不引入显著的计算开销。我们得出结论,CRANE提供了一种有效的方法,系统地在进一步分析之前识别并纠正原始纳米孔信号中的错误,这可以促进一种专门为原始纳米孔信号设计的新类别的错误校正机制。源代码:CRANE可在https://github.com/STORMgroup/CRANE上获得。我们还在GitHub页面上提供了完全重现我们结果的脚本。

英文摘要

Nanopore sequencing can read substantially longer sequences of nucleic acid molecules, called reads, than other sequencing methods, which has led to advances in genomic analysis such as the gapless human genome assembly. By analyzing the raw electrical signal reads that nanopore sequencing generates from molecules, existing works can map these reads without translating them into DNA characters (i.e., basecalling), allowing for quick and efficient analysis of sequencing data. However, raw signals often contain errors due to noise and processing errors, which limits the overall accuracy of raw signal analysis. Our goal in this work is to detect and correct errors in raw signals to improve the accuracy of raw signal analyses. To this end, we propose CRANE, a mechanism that trains and utilizes a Hidden Markov Model (HMM) to accurately correct signal errors. Our extensive evaluation on various datasets shows that CRANE 1) consistently improves the overall accuracy of the underlying raw signal analysis tools, 2) minimizes the burden of optimizing analysis pipelines for newer nanopore technologies, and 3) does not introduce substantial computational overhead. We conclude that CRANE provides an effective mechanism to systematically identify and correct the errors in raw nanopore signals before further analysis, which can enable the development of a new class of error correction mechanisms purely designed for raw nanopore signals. Source Code: CRANE is available at https://github.com/STORMgroup/CRANE. We also provide the scripts to fully reproduce our results on our GitHub page