arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.15398 2026-05-18 cs.GR cs.CV

3DEditSafe: Defending 3D Editing Pipelines from Unsafe Generation

3DEditSafe: 防御3D编辑流程中的不安全生成

Nicole Meng, Zheyuan Liu, Meng Jiang, Yingjie Lao

AI总结本文提出3DEditSafe框架，通过安全正则化约束不安全语义传播，减少3D编辑中的不安全内容生成，揭示安全与质量的权衡。

详情

AI中文摘要

近期3D生成编辑的进步，特别是基于3D高斯点散布（3DGS）的流程，实现了从文本提示中高保真的多视角一致场景操控。然而，我们发现这些流程在处理不安全提示时会产生传播和优化的不安全编辑。本文研究了3D编辑流程中的不安全生成，证明这种行为可能导致最终3D表示中一致但不适宜工作（NSFW）的内容。为解决此问题，我们提出了3DEditSafe，一个安全正则化的3D编辑框架，通过生成阶段的安全指导和渲染视图的3D安全正则化、安全语义投影、残差抑制和掩码感知保留，引导优化远离不安全的编辑方向。我们在EditSplat场景上使用对象兼容的不安全提示基准评估了我们的方法，并证明2D安全指导单独不足以防止不安全的3D编辑。3DEditSafe减少了不安全语义对齐和视图级攻击成功率，同时揭示了安全与质量之间的权衡，更强的不安全抑制可能引入伪影或降低不安全提示的保真度。到目前为止，这项工作是首次尝试研究并防御文本驱动的3D编辑流程中的不安全生成，强调了需要直接在优化的3D表示上操作的安全机制。

英文摘要

Recent advances in 3D generative editing, particularly pipelines based on 3D Gaussian Splatting (3DGS), have achieved high-fidelity, multi-view-consistent scene manipulation from text prompts. However, we find that these pipelines also introduce new safety risks when unsafe prompts produce edits that are propagated and optimized across views. In this work, we study unsafe generation in 3D editing pipelines and show that such behavior can lead to coherent, undesirable Not-Safe-For-Work (NSFW) content in the final 3D representation. To address this, we propose 3DEditSafe, a safety-regularized 3D editing framework that constrains unsafe semantic propagation during optimization. 3DEditSafe combines generation-stage safety guidance with rendered-view 3D safety regularization, safe semantic projection, residue suppression, and mask-aware preservation to steer optimization away from unsafe editing directions. We evaluate our approach on EditSplat scenes using an object-compatible unsafe prompt benchmark and show that 2D safety guidance alone is not consistently sufficient to prevent unsafe 3D edits. 3DEditSafe reduces unsafe semantic alignment and view-level attack success rates, while revealing a safety-quality tradeoff in which stronger unsafe suppression can introduce artifacts or reduce unsafe-prompt fidelity. To our knowledge, this work is the first attempt to study and defend against unsafe generation in text-driven 3D editing pipelines, highlighting the need for safety mechanisms that operate directly on optimized 3D representations.

URL PDF HTML ☆

赞 0 踩 0

2605.15392 2026-05-18 physics.optics cs.CV

Frequency-domain Event-based Imaging for Selective Surveillance

频域事件成像用于选择性监控

Megan Birch, James Rick, Adrish Kar, Jason Zutty, Joseph L. Greene

AI总结本文提出FRIES框架，通过频域分析事件数据，用于识别机械振动和旋转物体，结合RTS可视化技术，在室内和户外实验中验证了其在动态背景下的有效性。

Comments 14 pages, 11 figures

详情

AI中文摘要

事件相机（EBCs）因其微秒级像素级辐射变化报告和高动态范围，成为监控中的有吸引力的传感模式。然而，其异步、稀疏输出需要在事件空间中识别目标的算法。本文引入了频率率信息事件空间（FRIES），一种神经形态处理框架，用于检测事件中的周期性，如旋转器旋转和机械振动，以区分和监控人造物体。FRIES首先应用时间门来抑制背景和噪声，然后将事件聚合为像素级活动（如密度）图，并将像素聚类为感兴趣区域（ROIs）。对每个ROI应用局部频谱分析，以提取主导频率，用于区分结构化物体特征与无结构背景和噪声。被区分的目标通过共振时间表面（RTS）可视化，这是一种频率选择性方法，通过事件与其提取频率的相位相干性加权，奖励同步内容并抑制异步杂波。我们在受控室内实验中演示了FRIES和RTS，以恢复机械切碎机和无人机旋转器的旋转频率，对抗移动背景。我们进一步在户外数据上测试这些方法，以检测悬停无人机，对抗现实的树线。这些初步结果确立了频域事件处理作为神经形态管道中选择性监控的有前景的前端，以及利用高时间分辨率实现频谱区分的互补监控模式。

英文摘要

Event-based cameras (EBCs) are an attractive sensing modality for surveillance due to their reporting of pixel-level radiance changes with microsecond resolution and high dynamic range, enabling motion extraction while suppressing background. Their asynchronous, sparse output, however, necessitate algorithms that identify targets in event-space without processing full frames. We introduce Frequency Rate Information for Event Space (FRIES), a neuromorphic processing framework that detects periodicity in events, such as rotor rotation and mechanical vibrations, to discriminate and monitor man-made objects. FRIES first applies a time gate to suppress background and noise, then aggregates events into a pixel-wise activity (e.g., density) map and clusters pixels into regions-of-interest (ROIs). A localized spectral analysis is applied to each ROI to extract dominant frequencies used to distinguish structured object signatures from unstructured background and noise. Discriminated targets are visualized using a Resonant Time Surface (RTS), a frequency-selective method that weights events by their phase coherence with the extracted frequencies, rewarding in-sync content and suppressing out-of-sync clutter. We demonstrate FRIES and RTS in a controlled indoor experiment to recover the rotational frequency of a mechanical chopper and drone rotors against a moving background. We further test these methods on an outdoor data to detect a hovering drone against a realistic treeline. These preliminary results establish frequency-domain event processing as a promising front-end for selective surveillance in neuromorphic pipelines and a complementary surveillance modality, leveraging the high temporal resolution to enable spectral discrimination.

URL PDF HTML ☆

赞 0 踩 0

2605.15370 2026-05-18 quant-ph cs.LG

Quantum Feature Pyramid Gating for Seismic Image Segmentation

量子特征金字塔门控用于地震图像分割

Taha Gharaibeh, Jyotsna Sharma

AI总结本文提出量子特征门控方法，通过参数化量子电路在编码器-解码器中实现特征融合，提升地震图像分割精度，验证量子特征融合在密集预测中的有效性。

详情

AI中文摘要

准确识别盐丘对于地震解释至关重要，因为盐结构会扭曲波传播，复杂化速度建模，遮蔽储层几何形状，并增加勘探和钻井决策的不确定性。尽管混合量子-经典模型在小规模图像分类任务中表现出色，但其在密集像素级地球物理预测中的价值尚未得到充分验证。本文介绍了一种混合分割架构，即量子特征门控，该架构在编码器-解码器管道中的特征融合点嵌入了参数化量子电路（PQC）。一个4量子位、2层的PQC通过数据重新上传计算每个特征金字塔网络合并点的学得凸组合。全局平均池化层将编码器特征映射到固定4维量子输入，将72参数量子预算与主干大小和图像分辨率解耦。该方法在2018年TGS盐识别挑战赛上使用4000张101x101分辨率的地震图像进行评估，涵盖两种集成拓扑、八种电路变体和六个参数从8M到118M的编码器，在五折交叉验证下进行测试。在控制的EfficientNetV2-L消融实验中，256x256分辨率下，将三个量子FPN门控替换为逐元素加法，同时保持编码器、损失计划、分割和阈值搜索固定，使平均IoU从0.9389降至0.8404，差距达9.85个百分点。将相同电路作为自定义U-Net中的跳连注意模块，使IoU比SolidUNet基线提高0.88点，表明PQC的贡献取决于其门控的位置和内容。这些结果提供了受控证据，证明量子特征融合可以提升密集地震分割。

英文摘要

Accurate salt-body delineation is essential for seismic interpretation because salt structures distort wave propagation, complicate velocity-model building, obscure reservoir geometry, and increase uncertainty in exploration and drilling decisions. Although hybrid quantum-classical models have shown competitive performance on small-scale image-classification tasks, their value for dense, pixel-level geophysical prediction remains largely untested. This work introduces quantum feature gating, a hybrid segmentation architecture that embeds a parameterized quantum circuit (PQC) at feature-fusion points within an encoder-decoder pipeline. A 4-qubit, 2-layer PQC with data re-uploading computes a learned convex combination of lateral and top-down features at each Feature Pyramid Network merge point. A global-average-pooling layer maps encoder features to a fixed 4-dimensional quantum input, decoupling the 72-parameter quantum budget from backbone size and image resolution. The method is evaluated on the 2018 TGS Salt Identification Challenge using 4,000 seismic images at 101 x 101 resolution, across two integration topologies, eight circuit variants, and six encoders with 8M to 118M parameters under five-fold cross-validation. In a controlled EfficientNetV2-L ablation at 256 x 256 resolution, replacing the three Quantum FPN Gates with element-wise addition while holding the encoder, loss schedule, splits, and threshold search fixed reduces mean IoU from 0.9389 to 0.8404, a 9.85 percentage-point gap. Inserting the same circuit as skip-connection attention in a custom U-Net improves IoU by 0.88 points over the SolidUNet baseline, showing that the PQC contribution depends on where and what it gates. These results provide controlled evidence that quantum feature fusion can improve dense seismic segmentation.

URL PDF HTML ☆

赞 0 踩 0

2605.15350 2026-05-18 math.OC cs.LG

Stochastic Compositional Optimization via Hybrid Momentum Frank--Wolfe

通过混合动量Frank-Wolfe实现随机组合优化

El Mahdi Chayti

AI总结本文提出混合动量随机Frank-Wolfe算法，无需假设外层函数F的光滑性，结合动量Jacobian追踪器与泰勒修正函数追踪器，实现非凸目标函数的O(K^{-1/4})收敛率。

详情

AI中文摘要

随机组合优化旨在最小化形式为min_{x∈X}F(f(x),x)的目标函数，其中f仅可通过噪声随机查询获取。现有方法假设外层函数F连续可导，排除了如鲁棒最大损失、条件风险价值和范数正则化等重要应用。本文提出混合动量随机Frank-Wolfe算法，通过结合动量基Jacobian追踪器与泰勒修正函数追踪器，将完整的随机线性化而非单个梯度输入广义线性最小化oracle。对于非凸目标函数和L_F-Lipschitz外层函数，算法在广义Frank-Wolfe间隙中达到O(K^{-1/4})收敛率，匹配投影自由单样本随机方法在期望光滑性下的最优复杂度。分析扩展到具有有界r-阶矩的重尾噪声Oracle（r∈(1,2]），并恢复Vladarean等人（2023）在噪声消失时的确定性速率。

英文摘要

Stochastic compositional optimization minimizes objectives of the form $\min_{\bm{x} \in \mathcal{X}} F(\bm{f}(\bm{x}), \bm{x})$, where $\bm{f}$ is accessible only through noisy stochastic queries. Existing methods for this problem assume that the outer function $F$ is continuously differentiable, which excludes many practically important applications such as robust max-of-losses, Conditional Value-at-Risk, and norm regularizers. We propose the Hybrid Momentum Stochastic Frank--Wolfe algorithm, which drops the smoothness assumption on $F$. By combining a momentum-based Jacobian tracker with a Taylor-corrected function tracker, the algorithm feeds an entire stochastic linearization -- rather than a single gradient -- into a generalized linear minimization oracle. We establish an $\mathcal{O}(K^{-1/4})$ convergence rate in the generalized Frank--Wolfe gap for non-convex objectives with $L_F$-Lipschitz outer functions, matching the optimal complexity for projection-free single-sample stochastic methods under expected smoothness. The analysis extends to heavy-tailed noise oracles with bounded $r$-th moments for $r \in (1, 2]$ and recovers the deterministic rates of Vladarean et al (2023) as the noise vanishes.

URL PDF HTML ☆

赞 0 踩 0

2605.15320 2026-05-18 cs.GR cs.CV cs.LG

FFAvatar: Few-Shot, Feed-Forward, and Generalizable Avatar Reconstruction

FFAvatar: 少样本、前馈和可泛化的头像重建

Thuan Hoang Nguyen, Jiahao Luo, Yinyu Nie, Hao Li, Gordon Guocheng Qian, Jian Wang

AI总结 FFAvatar通过多视图查询-Former融合多源图像信息，实现高保真3D高斯头像重建，支持实时部署与高质量动画。

Comments Project Page: https://ffavatar.github.io

2605.15312 2026-05-18 cs.CY cs.CV

Beyond Performance Disparities: A Three-Level Audit of Representational Harm in CelebA

超越表现差异：对CelebrA中表征性伤害的三级审计

Sieun Park, Yuanmo He

AI总结本文通过三级审计揭示CelebrA数据集中性别化的年龄和美貌标准如何在数据和模型中再现，指出表征性伤害导致女性被过度审视而老年男性被排除在外。

Comments 15 pages, 8 figures

详情

AI中文摘要

大规模面部数据集如CelebrA在计算机视觉中广泛应用，但其标签中的文化偏见仍被忽视。公平性研究区分了表征性与分配性伤害，但对计算机视觉数据集的审计多关注分类标签，未探讨此类伤害如何在学习特征和模型注意力中体现。本文从数据集结构、学习特征权重和空间注意力三级层面分析CelebrA，聚焦性别化的年龄和美貌标准如何在数据中编码并在模型行为中再现。首先，202599张图像的分层聚类显示39个属性组织成与文化原型一致的潜在特质束：表演性女性（年轻、化妆、装饰）和专业男性（老化、面部毛发、正式着装）。尽管女性整体更常被评价为有吸引力，但被分配到老化或男性化簇时会遭受严重惩罚。其次，XGBoost结合SHAP分析揭示性别特定效应，如脂肪减少吸引力仅对女性有效。第三，Grad-CAM发现女性和年轻男性子群的预测集中在中面部线索，而老年男性的预测则偏向外围线索如头发和服装。老年男性获得最高准确率但最低平均精度，表明被数据集评估模板排除。文化双重标准由此从媒体代表进入数据标签、特征权重和模型注意力，产生两种表征性伤害：在狭窄评估模板下对女性的过度审视，以及完全排除老年男性。聚焦性能差异的公平性指标掩盖了这两种伤害，强调在公平性研究中需解决表征性伤害。

英文摘要

Large-scale facial datasets like CelebA are widely used in computer vision, yet the cultural biases embedded in their labels remain underexplored. Fairness research has distinguished representational from allocational harms, but audits of computer vision datasets have mostly examined categorical labels, leaving open how such harms appear in learned features and model attention. This paper examines CelebA at three levels: dataset structure, learned feature weights, and spatial attention, focusing on how gendered double standards of ageing and beauty are encoded in the data and reproduced in model behaviour. First, hierarchical clustering of 202,599 images shows that the 39 attributes organise into latent trait bundles aligned with cultural archetypes: performative femininity (youth, makeup, adornment) and professional masculinity (ageing, facial hair, formal attire). Female faces, though more often rated attractive overall, incur steep penalties when assigned to ageing or masculine-coded clusters. Second, XGBoost with SHAP analysis reveal gender-specific effects, such as adiposity reducing attractiveness only for females. Third, Grad-CAM finds that predictions for female and younger male subgroups concentrate on mid-face cues, whereas predictions for older males drift toward peripheral cues such as hair and clothing. Older males attain the highest accuracy but the lowest average precision, indicating categorical exclusion of groups outside the dataset's evaluative templates. Cultural double standards thus pass from media representation into dataset labels, feature weights, and model attention, producing two representational harms: hyper-scrutiny of women under a narrow evaluative template, and exclusion of older men from the scheme entirely. Fairness metrics focused on performance disparities mask both, underscoring the need to address representational harm in fairness research.

URL PDF HTML ☆

赞 0 踩 0

2605.15307 2026-05-18 cs.GR cs.CV cs.MM cs.SD

Sound Sparks Motion: Audio and Text Tuning for Video Editing

声音激发动作：用于视频编辑的音频和文本微调

AmirHossein Naghi Razlighi, Aryan Mikaeili, Ali Mahdavi-Amiri, Daniel Cohen-Or, Yiorgos Chrysanthou

AI总结本文提出Sound Sparks Motion框架，通过测试时调整音频视觉生成模型的多模态条件信号，实现视频动作编辑，无需训练，通过音频潜在和文本条件残差扰动促进动作修改，同时利用视觉语言模型反馈提升编辑效果。

Comments Project Page: https://amirhossein-razlighi.github.io/Sound_Sparks_Motion

详情

AI中文摘要

以动作为中心的视频编辑仍然对大生成视频模型来说具有挑战性，这些模型通常对外观变化反应良好，但难以在现有片段中生成特定的局部动作或状态转换。我们介绍了Sound Sparks Motion，一种无需训练的框架，通过在测试时调整音频视觉视频生成模型的内部多模态条件信号，实现动作编辑。与修改模型权重不同，我们的方法仅调整两个轻量级变量：从源视频导出的音频潜在和文本条件的残差扰动。我们发现这种组合可以鼓励动作编辑，这些动作在仅通过提示控制时，底层模型往往难以实现。由于没有直接方法评估文本和动作之间的时间对齐，我们利用视觉语言模型提供反馈，指示生成视频中是否出现了预期的动作。这种简单的监督产生了一个有效的语义目标用于动作编辑，而正则化和感知-时间约束有助于保持内容和视觉质量。除了单视频调整外，我们还表明学习到的潜在控制可以跨视频转移，表明它们捕捉了可重用的动作编辑方向，而不是过拟合到单个示例。我们的结果强调了多模态条件调整，特别是通过音频路径，作为动作感知视频编辑的有前途的方向，并表明测试时调整可以作为轻量级的探测机制，帮助揭示模型多模态条件中嵌入的动作控制。代码和数据可通过我们的项目页面获取：https://amirhossein-razlighi.github.io/Sound_Sparks_Motion/

英文摘要

Motion-centric video editing remains difficult for large generative video models, which often respond well to appearance changes but struggle to produce specific, localized actions or state transitions in an existing clip. We introduce Sound Sparks Motion, a training-free framework that enables motion editing in an audio-visual video generation model by tuning its internal multimodal conditioning signals at test time. Rather than modifying model weights, our method tunes only two lightweight variables: an audio latent derived from the source video and a residual perturbation in the text-conditioning. We find that this combination can encourage motion edits that the underlying model often struggles to realize under prompt-only control. Since there is no direct way to evaluate temporal alignment between text and motion, we guide the tuning process using a vision-language model that provides feedback indicating whether the intended motion appears in the generated video. This simple supervision yields an effective semantic objective for motion editing, while regularization and perceptual-temporal constraints help preserve content and visual quality. Beyond per-video tuning, we show that the learned latent controls are transferable across videos, suggesting that they capture reusable motion-edit directions rather than overfitting to a single example. Our results highlight multimodal conditioning tuning, particularly through the audio pathway, as a promising direction for motion-aware video editing, and suggest that test-time tuning can serve as a lightweight probing mechanism that helps reveal latent motion controls embedded in the model's multimodal conditioning. Code and data are available via our project page: https://amirhossein-razlighi.github.io/Sound_Sparks_Motion/

URL PDF HTML ☆

赞 0 踩 0

2605.15299 2026-05-18 cs.IR cs.AI

Fortress: A Case Study in Stabilizing Search Recommendations via Temporal Data Augmentation and Feature Pruning

Fortress：通过时间数据增强和特征剪枝稳定化搜索推荐

Milind Pandurang Jagre, Jia Huang, Dayvid V. R. Oliveira, Zhinan Cheng, Babak Seyed Aghazadeh, Puja Das, Chris Alvino, Jinda Han, Kailash Thiyagarajan

AI总结 Fortress通过时间数据增强和特征剪枝稳定化搜索推荐模型，提升预测稳定性和准确性，验证了在大规模应用市场中效果显著。

2605.15281 2026-05-18 cs.CR cs.AI

Autonomous Intelligent Agents for Natural-Language-Driven Web Execution with Integrated Security Assurance

Vinil Pasupuleti, Siva Rama Krishna Varma Bayyavarapu, Shrey Tyagi

AI总结本文提出了一种基于人工智能的自主测试框架，用于实现自然语言驱动的网页执行与集成安全验证。该框架通过导航可靠性、上下文感知选择器生成、后生成验证、智能等待注入和失败学习等五项策略，有效解决了传统网页测试套件易失效的问题。实验表明，该方法显著提升了脚本生成成功率，减少了导航失败和时间相关竞争条件，并大幅降低了测试创建时间；同时，它还能通过自然语言描述攻击场景，自动转换为安全检测探针，有效发现多种安全漏洞，为自然语言驱动的安全测试提供了新颖的解决方案。

Comments 6 pages, 4 figures, 5 tables, IEEE conference format

2605.15249 2026-05-18 cs.CR cs.LG

Enabling Adversarial Robustness in AI Models through Kubeflow MLOps

Stavros Bouras, Ioannis Korontanis, Antonios Makris, Konstantinos Tserpes

AI总结本文研究了如何在Kubernetes环境中提升AI模型的对抗鲁棒性。作者提出了一种基于Kubeflow MLOps的架构，能够在推理阶段自动检测对抗攻击并触发防御机制，从而保障模型的准确性和可靠性。实验表明，该方法能有效增强模型对对抗攻击的抵御能力，显著恢复因攻击导致的性能下降。

Comments Accepted at the 1st Workshop on Secure and Intelligent Data Spaces (SIDS 2026), co-located with the 27th IEEE International Conference on Mobile Data Management (MDM 2026)

2605.15241 2026-05-18 eess.IV cs.CV cs.LG

From Full and Partial Intraoral Scans to Crown Proposal: A Classification-Guided Restoration Assistance Pipeline

Rabin Kunwar, Dikshya Parajuli, Rujal Acharya, Romik Gosai, Prince Panta, Kundan Siwakoti, Shuvangi Adhikari, Saugat Kafley, Louis Digiorgio, Amit Regmi, Akio Tanaka, Masahiko Inada, Yuriko Komagamine, Kennta Kashiwazaki, Manabu Kanazawa

AI总结该研究提出了一种端到端的牙冠提案生成流程，旨在从全牙弓或部分牙弓的口腔扫描数据中生成个性化的牙冠初始方案，以辅助临床医生进行后续调整。方法结合了分类引导的分割策略和基于上下文的检索与拟合技术，有效解决了部分扫描数据分割精度低和生成牙冠细节丢失的问题。实验表明，该方法在多个评估指标上表现优异，具备较高的分割精度和实际应用价值。

2605.15240 2026-05-18 stat.ML cs.LG

On Kernel Eigen-alignments of KRR: Reconstruction and Generalization

Yang Liu, Ernest Fokoue, Richard Lange, Daniel Krutz

AI总结本文研究了核矩阵与学习目标之间的特征对齐在实现鲁棒泛化中的关键作用，建立了核方法泛化性能与矩阵特征向量和特征值估计之间的直接联系。通过分析核矩阵扰动对预测结果的影响，作者推导出基于特征值和特征向量估计稳定性的泛化误差上界，并指出在高秩核条件下，重建误差对泛化能力的预测作用有限。研究从特征值估计的角度提出了新的泛化界，表明强泛化能力需要增强特征向量对齐、增大特征值幅度或增大相邻特征值之间的间隔。

2605.15238 2026-05-18 cs.SE cs.AI cs.PL

Hydra: Efficient, Correct Code Generation via Checkpoint-and-Rollback Support

Alexander Du, Jianjun Ou, Danyang Zhuo, Matthew Lentz

AI总结本文提出了一种名为Hydra的系统，用于在代码生成过程中高效地恢复静态错误。Hydra通过异步检查和检查点回滚机制，避免了传统方法中高昂的延迟和令牌消耗，能够在生成过程中及时检测并修复错误，而无需重新生成已正确部分的代码。实验表明，Hydra在C/C++代码生成任务中，相比事后修复方法，显著降低了延迟和令牌使用量。

2605.15237 2026-05-18 cs.AR cs.AI

A3D: Agentic AI flow for autonomous Accelerator Design

Abinand Nallathambi, Christopher Knight, Shantanu Ganguly, Wilfried Haensch, Anand Raghunathan

AI总结 A3D 是一种基于智能体的 AI 流程，旨在实现从端到端的硬件加速器自动化设计。该方法通过自主分析工作负载、识别性能瓶颈、重构代码以适配高阶综合工具，并生成微架构，显著降低了加速器设计的复杂性和人工干预需求。A3D 还能够自动探索速度与面积的权衡空间，生成多样化的加速器设计方案，为复杂科学应用提供了高效且自动化的加速器设计解决方案。

详情

英文摘要

Accelerating applications through the design of hardware accelerators can significantly enhance system performance and energy efficiency. Despite advances, such as high-level synthesis (HLS), designing accelerators for complex applications still remains highly labor-intensive, demanding considerable expertise in understanding workloads to be accelerated, hardware design, micro-architecture, and EDA tool usage, posing challenges for application domain experts. Therefore, most accelerator solutions are limited to applications with a regular predictable dataflow. Advances in AI have enabled agents that perform autonomous planning, reasoning, execution and reflection, leading to unprecedented potential for automation through agentic AI. We present A3D, an Agentic AI flow for end-to-end Automation of hardware Accelerator Design. A3D automates workload analysis, performance bottleneck identification, code refactoring for HLS compatibility and micro-architecture generation. A3D also generates diverse accelerator designs by automatically exploring the speed-area tradeoff space. Recent efforts have explored the use of AI for specific tasks such as design space exploration in HLS, leaving several tasks to still be performed manually. A3D addresses the challenges in applying modern LLMs to accelerator design by judiciously partitioning tasks among specialist agents, orchestrating process loops with specialist and verifier agents, utilizing pre-existing and custom tools, and employing agentic RAG for codebase and proprietary EDA tool documentation exploration. Our implementation of A3D, using commercial components like Claude Sonnet 4.5 and the Catapult HLS tool, demonstrates its effectiveness by generating accelerator designs with no human intervention from complex scientific applications like LAMMPS (molecular dynamics simulation) and QMCPACK (quantum chemistry).

URL PDF HTML ☆

赞 0 踩 0

2605.15226 2026-05-18 cs.AR cs.AI cs.SE

Is Agentic AI Ready for Real-World Hardware Engineering? A Deep Dive with Phoenix-bench

Qingyun Zou, Feng Yu, Hongshi Tan, Bingsheng He, WengFai Wong

AI总结本文探讨了用于软件工程的智能体AI系统是否适用于实际的硬件工程任务，并引入了Phoenix-bench基准测试集，该基准集包含511个经过验证的Verilator实例，支持对硬件设计流程、错误修复和验证等任务的全面评估。研究发现，硬件工程与软件工程在错误传播机制和修复方式上存在显著差异，且定位精度和反馈机制对智能体性能影响显著，为未来智能体在硬件工程中的应用提供了重要参考。

详情

英文摘要

We ask whether agentic AI systems built for software engineering transfer to realistic hardware engineering. Existing hardware LLM benchmarks isolate sub-tasks but none jointly requires repository navigation, hierarchy-aware localization, Electronic Design Automation (EDA) executable verification, and maintenance-style patching. We introduce \textbf{Phoenix-bench}, a synchronized corpus of 511 verified Verilator instances from 114 GitHub repositories, each shipped with the developer patch, design-flow labels, fail-to-pass and pass-to-pass testbenches, and a Docker-pinned EDA environment so resolved-rate differences reflect agent behavior rather than toolchain availability. Using Phoenix-bench we run a uniform evaluation of four commercial agents and eight open-source agentic structures across four LLM backbones, plus two diagnostic interventions (file-level oracle localization and one round of testbench-log feedback). Three findings emerge. (i)~Software and hardware are fundamentally different engineering tasks: the same agent loses 37\% to 58\% from SWE-bench Verified to Phoenix-bench because hardware bugs propagate across parallel instantiated modules through signal flow rather than along a software-style call graph, and software-tuned agents stop at the symptom file instead of tracing back through the instantiation chain. (ii)~Failures concentrate on design control-flow / finite state machine (FSM) bugs, verification testbench bugs, and hard cases that demand cross-hierarchy signal-flow tracking and coordinated multi-file edits. (iii)~Localization granularity matters far more than localization itself: a perfect file-level oracle yields only $+1.4$\% because the agent then breaks files that did not need editing, while a single round of test case feedback lifts resolved rate by $42$\% to $45$\% because the test case tells \emph{where} the bug is and \emph{what} the fix has to look like.

URL PDF HTML ☆

赞 0 踩 0

2605.15225 2026-05-18 q-bio.QM cs.AI

Do Biological Structural Guarantees Earn Their Complexity?

Bogdan Banu

AI总结本文探讨了生物学结构保证是否值得其复杂性，通过构建三个深度基准测试，比较了基于生物机制（如代谢优先门控、自动诱导物群体感应和贝叶斯停滞检测）的AI框架与非生物替代方案及简化对照在数千次试验中的表现，验证了生物结构在可靠性上的实际优势与代价。

2605.15223 2026-05-18 cs.AR cs.AI

GenAI-Driven Approach to RISC-V Supply Chain Exploration

Nenad Petrovic, Andre Schamschurko, Yingjie Xu, Alois Knoll

AI总结本文提出了一种基于大语言模型（LLM）的流程，用于分析 RISC-V 供应链，结合视觉语言模型（VLM）和模型驱动工程（MDE），实现了对异构、非结构化供应链数据的多模态数据驱动分析。该方法通过 LLM 理解文本信息，VLM 提取图表、表格等视觉文档中的信息，构建供应链知识图谱，并利用 MDE 技术进行依赖关系验证、瓶颈检测和风险评估，从而支持对供应链韧性的探索性与系统性分析。实验表明，该方法在 RISC-V 生态系统中有效提升了供应链透明度和决策支持能力。

2605.15222 2026-05-18 cs.SE cs.CL cs.PL

PerfCodeBench: Benchmarking LLMs for System-Level High-Performance Code Optimization

Huihao Jing, Wenbin Hu, Haochen Shi, Hanyu Yang, Sirui Zhang, Shaojin Chen, Haoran Li, Yangqiu Song

AI总结 PerfCodeBench 是一个用于评估大语言模型在系统级高性能代码优化能力的可执行基准。该基准聚焦于需要硬件感知优化和性能瓶颈处理的实际系统任务，每个任务均包含正确性检查、基线实现和参考优化方案，从而同时评估代码的正确性与运行效率。实验表明，当前主流大语言模型在生成高效代码方面与专家实现仍存在显著差距，尤其在并行计算和GPU操作任务中表现较弱，突显了性能导向评估的重要性。

2605.15221 2026-05-18 cs.SE cs.AI cs.CL

Effective Harness Engineering for Algorithm Discovery with Coding Agents

Yoichi Ishibashi, Taro Yano, Masafumi Oyamada

AI总结本文研究了在算法发现任务中，如何设计有效的执行框架（harness）以提升基于大语言模型和进化搜索的自动算法生成效果。通过分析算法生成数量与深度、评估漏洞处理以及并行执行安全等问题，提出了改进的Vesper框架，并在圆填充问题上验证了其有效性。实验表明，在固定计算预算下，生成更少但更深入的算法能取得更优结果，同时更强大的模型更容易产生评估漏洞，凸显了漏洞检测的重要性。

2605.15213 2026-05-18 cs.IR cs.AI

An LLM-RAG Approach for Healthy Eating Index-Informed Personalized Food Recommendations

Yibin Wang, Yanjie Yang, Grace Melo Guerrero, Rodolfo M. Nayga, Azlan Zahid

AI总结该研究提出了一种基于健康饮食指数（HEI）的检索增强生成（RAG）框架，用于生成个性化的健康饮食推荐。该方法结合标准化营养数据库和大语言模型，通过构建食物嵌入空间并计算HEI评分，为用户提供符合健康标准的个性化饮食建议。实验结果表明，该方法能有效提升用户的HEI得分，提高饮食质量。

2605.15203 2026-05-18 cs.IR cs.AI cs.MA

Agent4POI: Agentic Context-Conditioned Affordance Reasoning for Multimodal Point-of-Interest Recommendation

Jinze Wang, Yangchen Zeng, Tiehua Zhang, Lu Zhang, Yuze Liu, Yongchao Liu, Xingjun Ma, Zhu Sun

AI总结本文提出了一种名为 Agent4POI 的新型兴趣点（POI）推荐框架，其核心在于推荐时动态生成与上下文条件相关的多模态表示，而非依赖于预计算的静态 POI 嵌入。该方法通过一个四阶段的大型语言模型代理，根据情境上下文生成动态的、场景特定的“可利用性”查询，并结合图像、评论和元数据进行跨模态推理，最终生成结构化且考虑不确定性的可利用性表示，从而提升推荐的准确性和适应性。实验表明，Agent4POI 在多个基准数据集和评估场景中均优于现有方法，尤其在冷启动和上下文变化场景下表现突出。

2605.14859 2026-05-18 cs.CR cs.AI

Do Coding Agents Understand Least-Privilege Authorization?

Zheng Yan, Jingxiang Weng, Charles Chen, Dengyun Peng, Ethan Qin, Jiannan Guan, Jinhao Liu, Qiming Yu, Yixin Yuan, Fanqing Meng, Carl Che, Mengkang Hu

AI总结随着代码代理越来越多地访问系统外壳、代码仓库和用户文件，最小权限授权成为安全部署的必要条件。本文研究当前模型是否能自行推断出权限边界，提出权限边界推理任务，并构建了包含120个真实终端任务的AuthBench基准测试集。研究发现，现有模型在权限分配上常出现遗漏必要权限或授予多余权限的问题，且增加推理时间并不能有效解决这一问题。为此，作者提出一种“充分性-紧致性分解”方法，通过任务前向模拟生成覆盖性策略，并对每个授予的权限进行审查，显著提升了模型在敏感任务中的成功率并降低了攻击成功的可能性。

2605.14716 2026-05-18 cs.GR cs.CV cs.LG

AnchorRoute: Human Motion Synthesis with Interval-Routed Sparse Contro

Pengcheng Fang, Tengjiao Sun, Dongjie Fu, Xiaoyu Zhan, Yanwen Guo, Hansung Kim, Xiaohao Cai

AI总结 AnchorRoute 是一种基于稀疏锚点的人体运动合成框架，通过用户指定的少量根位置、平面轨迹或身体点目标，生成完整的人体动作。该方法在生成阶段利用锚点生成条件特征，并注入到预训练的扩散模型中以保持生成质量，同时学习稀疏空间控制；在生成后阶段，通过锚点残差定义修正区间，结合软 token 更新进行精细化调整，从而在统一的锚点框架下实现生成与优化的结合。实验表明，AnchorRoute 在多种控制方式下均优于现有方法，生成动作更贴合锚点约束。

2605.13143 2026-05-18 cs.IT cs.LG math.IT

On the Generalization of Knowledge Distillation: An Information-Theoretic View

Bingying Li, Haiyun He

AI总结本文从信息论角度研究知识蒸馏的泛化能力，提出了一种基于耦合随机过程的理论框架，并定义了“蒸馏散度”作为衡量师生模型差异的指标。通过该框架，作者推导了学生模型相对于教师模型泛化差距的上下界，并进一步提出了一个考虑损失函数锐度的紧致界，揭示了教师模型局部平坦性对泛化性能的影响。在高斯线性模型的案例研究中，蒸馏散度被分解为偏差、方差和秩瓶颈成本，为蒸馏设计提供了可解释的指导。

Comments 18 pages

2605.09994 2026-05-18 cs.DC cs.LG

BatchWeave: A Consistent Object-Store-Native Data Plane for Large Foundation Model Training

Ting Sun, Junjie Zhang, Xiao Yan, Songxin Zhang, Zhuoyang Song, Jingyi Xi, Zunyao Mao, Bingyi Jing, Jiaxing Zhang, Zejian Xie

AI总结随着大型基础模型（LFM）训练的发展，数据管道需要从静态的数据加载层转变为能够与训练过程动态协同的组件。现有系统在故障隔离和批量语义支持方面存在不足，BatchWeave 提出了一种基于对象存储的训练数据平面，通过版本化清单和条件对象写入协调批量发布与恢复。其核心方法包括事务性全局批次（TGB）、存储层直接实现的恢复与保留机制，以及无需生产者间通信的分布式自适应提交算法，显著提升了训练吞吐量和系统可靠性。

2605.09033 2026-05-18 cs.CR cs.AI

ShadowMerge: A Novel Poisoning Attack on Graph-Based Agent Memory via Relation-Channel Conflicts

Yang Luo, Zifeng Kang, Tiantian Ji, Xinran Liu, Yong Liu, Shuyu Li, Lingyun Peng

AI总结本文提出了一种针对基于图的智能体记忆的新型投毒攻击方法——ShadowMerge，通过利用关系通道冲突来影响智能体的行为。该方法通过构造恶意关系，使其与合法关系共享相同的查询激活锚点和关系通道，但携带冲突的值，从而在不影响正常任务的前提下成功注入有害信息。实验表明，ShadowMerge在多个真实数据集上取得了高达93.8%的攻击成功率，显著优于现有方法，并揭示了当前防御机制在应对此类攻击时的不足。

Comments Preprint. Corresponding authors: Zifeng Kang and Tiantian Ji. Code is available at https://anonymous.4open.science/status/ShadowMerge-033C

2605.02651 2026-05-18 cs.DL cs.LG

ARA: Agentic Reproducibility Assessment For Scalable Support Of Scientific Peer-Review

Kevin Riehl, Andres L. Marin, Nikofors Zacharof, Fan Wu, Patrick Langer, Robert Jakob, Anastasios Kouvelas, Georgios Fontaras, Michail A. Makridis

AI总结随着现代科研成果的规模和复杂性不断增加，科学同行评审在评估研究可重复性方面面临挑战。ARA（智能可重复性评估）将可重复性评估形式化为对科学文档的结构化推理任务，通过构建包含数据源、方法、实验和结果的有向工作流图，并基于结构和内容特征进行评估。实验表明，ARA在多个领域和不同大语言模型上均表现出良好的泛化能力，其在多个基准测试中的准确率显著优于现有方法，展示了其在大规模辅助科学同行评审中的应用潜力。

2605.01970 2026-05-18 cs.CR cs.AI

Trojan Hippo: Weaponizing Agent Memory for Data Exfiltration

Debeshee Das, Julien Piet, Darya Kaviani, Luca Beurer-Kellner, Florian Tramèr, David Wagner

AI总结本文研究了针对大型语言模型代理的“特洛伊河马”攻击，该攻击通过在代理的长期记忆中植入隐蔽载荷，当用户讨论敏感话题时激活，从而实现数据外泄。研究提出了一种动态评估框架，用于系统评估不同内存架构和防御机制的有效性，并在实际邮件助手系统中验证了该攻击的高成功率（可达85%-100%）。研究还分析了多种防御方法的效果，揭示了安全性和实用性的权衡问题，为实际防御部署提供了重要参考。

详情

英文摘要

Memory systems enable otherwise-stateless LLM agents to persist user information across sessions, but also introduce a new attack surface. We characterize the Trojan Hippo attack, a class of persistent memory attacks that operates in a more realistic threat model than prior memory poisoning work: the attacker plants a dormant payload into an agent's long-term memory via a single untrusted tool call (e.g., a crafted email), which activates only when the user later discusses sensitive topics such as finance, health, or identity, and exfiltrates high-value personal data to the attacker. While anecdotal demonstrations of such attacks have appeared against deployed systems, no prior work systematically evaluates them across heterogeneous memory architectures and defenses. We introduce a dynamic evaluation framework comprising two components: (1) an OpenEvolve-based adaptive red-teaming benchmark that stress-tests defenses and memory backends against continuously refined attacks, and (2) the first capability-aware security/utility analysis for persistent memory systems, enabling principled reasoning about defense deployment across different usage profiles. Instantiated on an email assistant across four memory backends (explicit tool memory, agentic memory, RAG, and sliding-window context), Trojan Hippo achieves up to 85-100% ASR against current frontier models from OpenAI and Google, with planted memories successfully activating even after 100 benign sessions. We evaluate four memory-system defenses inspired by basic security principles, finding they substantially reduce attack success rates (to as low as 0-5%), though at utility costs that vary widely with task requirements. Because of this substantial security-utility tradeoff, the effective real-world deployment of defenses remains an open challenge, which our evaluation framework is specifically designed to address.

URL PDF HTML ☆

赞 0 踩 0

2605.00424 2026-05-18 cs.CR cs.AI cs.MA cs.SE

Skills as Verifiable Artifacts: A Trust Schema and a Biconditional Correctness Criterion for Human-in-the-Loop Agent Runtimes

Alfredo Metere

AI总结本文研究了如何在人类介入的智能体运行时中，对技能（一种增强大语言模型的结构化指令包）进行可信验证的问题。作者提出了一种信任架构和一个双向正确性准则，确保技能在加载前必须经过验证，而非依赖签名或来源注册等信任机制。该方法通过明确的验证层级和能力门控策略，使人类介入仅在验证失败时触发，从而提升系统的可扩展性和可持续性。研究贡献具有通用性，不依赖模型再训练或专有基础设施。

2604.14572 2026-05-18 cs.IR cs.AI cs.CL cs.MA

Don't Retrieve, Navigate: Distilling Enterprise Knowledge into Navigable Agent Skills for QA and RAG

Yiqun Sun, Pengfei Wei, Lawrence B. Hsieh

AI总结本文提出了一种名为Corpus2Skill的方法，通过将企业文档库离线蒸馏为分层技能目录，使大型语言模型在回答问题时能够主动导航知识库，而非被动检索。该方法在企业客服基准测试中表现出优于多种RAG基线的问答质量与证据支持能力，并揭示了导航式方法在特定领域知识库中的优势，为知识引导系统的架构设计提供了指导。