URL PDF HTML ☆

赞 0 踩 0

2602.00510 2026-06-19 cs.AI cs.LG cs.SE 版本更新

PCBSchemaGen: Reward-Guided LLM Code Synthesis for Printed Circuit Boards (PCB) Schematic Design with Structured Verification

PCBSchemaGen: 奖励引导的LLM代码合成用于印刷电路板(PCB)原理图设计及结构化验证

Huanghaohe Zou, Peng Han, Emad Nazerian, Mafu Zhang, Zhicheng Guo, Alex Q. Huang

发表机构 * Semiconductor Power Electronics Center (SPEC)（半导体功率电子中心）； The University of Texas at Austin（德克萨斯大学奥斯汀分校）； Arizona State University（亚利桑那州立大学）

AI总结提出PCBSchemaGen框架，通过结构化验证器引导冻结的LLM生成可修复的PCB原理图，在无单元测试的领域实现高准确率。

详情

AI中文摘要

大多数LLM代码合成基准依赖于单元测试作为奖励预言，但PCB原理图设计没有这样的测试：正确性由真实IC封装和引脚级分配的结构化物理约束定义，每个任务的金标准参考不可用，且SPICE仿真无法验证原理图级正确性。我们提出PCBSchemaGen，一个无需训练的推理时框架，将冻结的LLM转变为可验证、可修复的PCB原理图生成器。该框架从IC数据手册中提取领域模式以约束LLM解码，将其与一个具有引脚级错误定位的确定性5层连续奖励验证器配对，并通过汤普森采样臂获取赌博机优化候选方案。我们在两个PCB基准上评估，涵盖22个统一电路领域的227个真实IC任务，包括一个从公开原理图导出的套件，作为完全保留的泛化测试（验证器、知识图谱库和提示在评估前冻结）。在我们的框架下，一个开放权重的31B模型（Gemma-4-31B）平均通过PCBBench任务的81.3%，且同一框架在两个基准间迁移时无需更改验证器代码；而基于相同Gemma-4-31B骨干网络的Circuitron式推理时提示基线在困难的系统级设计上崩溃。这表明在确定性结构验证器下的推理时优化是在没有单元测试预言的领域中实现无参考LLM代码合成的一般方法。我们的基准和确定性验证器在此https URL公开可用。

英文摘要

Most LLM code-synthesis benchmarks rely on unit tests as the reward oracle, but PCB schematic design has none: correctness is defined by structured physical constraints over real IC packages and pin-level assignments, per-task golden references are unavailable, and SPICE simulation does not validate schematic-level correctness. We introduce PCBSchemaGen, a training-free inference-time framework that turns a frozen LLM into a verifiable, repairable PCB schematic generator. The framework induces a domain schema from IC datasheets to ground LLM decoding, pairs it with a deterministic 5-layer continuous-reward verifier with pin-level error localization, and refines candidates through a Thompson Sampling arm-acquiring bandit. We evaluate on 2 PCB benchmarks covering 227 real-IC tasks across 22 unified circuit domains, including a public-schematic-derived suite that serves as a fully held-out generalization test (verifier, KG library, and prompts frozen before any evaluation). Under our framework, an open-weight 31B model (Gemma-4-31B) passes 81.3% of PCBBench tasks on average, and the same framework transfers across both benchmarks with zero verifier code changes; a Circuitron-style inference-time prompting baseline on the same Gemma-4-31B backbone collapses on hard system-level designs. This suggests inference-time refinement under a deterministic structural verifier is a general recipe for reference-free LLM code synthesis in domains without unit-test oracles. Our benchmarks and deterministic verifier are publicly available at https://github.com/HZou9/PCBSchemaGen_v2.

URL PDF HTML ☆

赞 0 踩 0

2601.22970 2026-06-19 cs.LG cs.AI 版本更新

Stabilizing the Q-Gradient Field for Policy Smoothness in Actor-Critic Methods

稳定Q-梯度场以实现Actor-Critic方法中的策略平滑性

Jeong Woon Lee, Kyoleen Kwak, Daeho Kim, Hyoseok Hwang

发表机构 * College of Software, Kyung Hee University（韩国庆熙大学软件学院）

AI总结针对连续动作空间中actor-critic方法策略振荡问题，提出基于评论家微分几何的PAVE框架，通过稳定Q-梯度场实现策略平滑，无需修改actor。

详情

AI中文摘要

通过连续actor-critic方法学习的策略通常表现出不稳定的高频振荡，使其不适合物理部署。当前方法试图通过直接正则化策略输出来强制平滑性。我们认为这种方法治标不治本。在这项工作中，我们从理论上建立了策略非平滑性根本上由评论家的微分几何决定。通过对actor-critic目标应用隐式微分，我们证明了最优策略的敏感性受限于Q函数的混合偏导数（噪声敏感性）与其动作空间曲率（信号区分度）之比。为了实证验证这一理论见解，我们引入了PAVE（策略感知值场均衡），一种以评论家为中心的正则化框架，将评论家视为标量场并稳定其诱导的动作梯度场。PAVE通过最小化Q-梯度波动同时保持局部曲率来修正学习信号。实验结果表明，PAVE在不修改actor的情况下，实现了与策略侧平滑正则化方法相当的平滑性，同时保持了有竞争力的任务性能。

英文摘要

Policies learned via continuous actor-critic methods often exhibit erratic, high-frequency oscillations, making them unsuitable for physical deployment. Current approaches attempt to enforce smoothness by directly regularizing the policy's output. We argue that this approach treats the symptom rather than the cause. In this work, we theoretically establish that policy non-smoothness is fundamentally governed by the differential geometry of the critic. By applying implicit differentiation to the actor-critic objective, we prove that the sensitivity of the optimal policy is bounded by the ratio of the Q-function's mixed-partial derivative (noise sensitivity) to its action-space curvature (signal distinctness). To empirically validate this theoretical insight, we introduce PAVE (Policy-Aware Value-field Equalization), a critic-centric regularization framework that treats the critic as a scalar field and stabilizes its induced action-gradient field. PAVE rectifies the learning signal by minimizing the Q-gradient volatility while preserving local curvature. Experimental results demonstrate that PAVE achieves smoothness comparable to policy-side smoothness regularization methods, while maintaining competitive task performance, without modifying the actor.

URL PDF HTML ☆

赞 0 踩 0

2601.21542 2026-06-19 cs.CV cs.AI 版本更新

Bi-Anchor Interpolation Solver for Accelerating Generative Modeling

双锚点插值求解器加速生成建模

Hongxu Chen, Hongxiang Li, Zhen Wang, Long Chen

发表机构 * The Hong Kong University of Science（香港科学与技术大学）

AI总结提出BA-solver，通过轻量SideNet（1-2%主干大小）学习双向时间感知和双锚点速度积分，在不重新训练主干的情况下，以极低训练成本实现10步内达到100+步Euler求解器质量，支持即插即用。

详情

AI中文摘要

流匹配（FM）模型已成为高保真合成的前沿范式。然而，它们对迭代常微分方程（ODE）求解的依赖造成了显著的延迟瓶颈。现有解决方案面临两难：无训练求解器在低神经函数评估（NFE）下性能严重下降，而基于训练的一步或几步生成方法则面临高昂的训练成本且缺乏即插即用的通用性。为弥合这一差距，我们提出了双锚点插值求解器（BA-solver）。BA-solver保留了标准无训练求解器的通用性，同时通过引入轻量级SideNet（主干大小的1-2%）与冻结主干并行，实现了显著加速。具体而言，我们的方法基于两个协同组件：1）双向时间感知，其中SideNet学习近似未来和过去的速度，无需重新训练重型主干；2）双锚点速度积分，利用带有两个锚点速度的SideNet高效近似中间速度，用于批量高阶积分。通过利用主干建立高精度“锚点”并利用SideNet加密轨迹，BA-solver能够以最小误差实现大步长。在ImageNet-256^2上的实验结果表明，BA-solver仅需10次NFE即可达到与100+次NFE的Euler求解器相当的生成质量，并在仅5次NFE时保持高保真度，且训练成本可忽略不计。此外，BA-solver确保与现有生成流水线的无缝集成，便于图像编辑等下游任务。

英文摘要

Flow Matching (FM) models have emerged as a leading paradigm for high-fidelity synthesis. However, their reliance on iterative Ordinary Differential Equation (ODE) solving creates a significant latency bottleneck. Existing solutions face a dichotomy: training-free solvers suffer from significant performance degradation at low Neural Function Evaluations (NFEs), while training-based one- or few-steps generation methods incur prohibitive training costs and lack plug-and-play versatility. To bridge this gap, we propose the Bi-Anchor Interpolation Solver (BA-solver). BA-solver retains the versatility of standard training-free solvers while achieving significant acceleration by introducing a lightweight SideNet (1-2% backbone size) alongside the frozen backbone. Specifically, our method is founded on two synergistic components: \textbf{1) Bidirectional Temporal Perception}, where the SideNet learns to approximate both future and historical velocities without retraining the heavy backbone; and 2) Bi-Anchor Velocity Integration, which utilizes the SideNet with two anchor velocities to efficiently approximate intermediate velocities for batched high-order integration. By utilizing the backbone to establish high-precision ``anchors'' and the SideNet to densify the trajectory, BA-solver enables large interval sizes with minimized error. Empirical results on ImageNet-256^2 demonstrate that BA-solver achieves generation quality comparable to 100+ NFEs Euler solver in just 10 NFEs and maintains high fidelity in as few as 5 NFEs, incurring negligible training costs. Furthermore, BA-solver ensures seamless integration with existing generative pipelines, facilitating downstream tasks such as image editing.

URL PDF HTML ☆

赞 0 踩 0

2601.22107 2026-06-19 cs.LG 版本更新

基于仿真学习的多臂腹腔镜手术机器人碰撞感知操作的神经最小距离估计

Sarvin Ghiasi, Majid Roshanfar, Jake Barralet, Liane S. Feldman, Amir Hooshiar

发表机构 * Surgical Performance Enhancement and Robotics (SuPER) Centre, Department of Surgery（外科性能增强与机器人中心（SuPER）中心，外科部）； The Wilfred and Joyce Posluns Centre for Image Guided Innovation & Therapeutic Intervention (PCIGITI)（威廉与乔伊斯·波斯伦中心（PCIGITI）影像引导创新与治疗干预中心）； The Hospital for Sick Children (SickKids)（儿童医院（SickKids））

AI总结提出结合分析建模、实时仿真与深度残差神经网络的框架，用于多臂手术机器人最小距离估计与碰撞预警，模型在验证集上R²=0.940，RMSE=42.0 mm。

详情

DOI: 10.3390/s26123744
Journal ref: Sensors 2026, 26(12), 3744

AI中文摘要

本研究提出了一个集成框架，通过解决多臂操纵器之间的最小距离估计和相关的碰撞感知警告，提高腹腔镜手术中机械臂的安全性和操作效率。通过结合分析建模、实时仿真和机器学习，该框架为确保机器人安全操作提供了稳健的解决方案。开发了一个分析模型，基于关节配置估计机械臂之间的最小距离，提供理论计算作为验证工具和基准。为补充这一点，创建了一个3D仿真环境，模拟两个7自由度Kinova机械臂（Kinova inc., Boisbriand, QC, Canada），生成了用于距离估计和碰撞警告的多样化配置数据集。利用这些见解，训练了一个以关节配置为输入的深度残差神经网络模型。在保留的验证集上，模型达到了R²=0.940，RMSE=42.0 mm，MAE=28.7 mm，且平均偏差接近零，展示了强大的预测准确性和在整个工作空间中的一致泛化能力。该框架旨在作为早期碰撞警告层，当预测的臂间距离低于0.2 m阈值时触发警告，考虑到Kinova Gen3（Kinova inc., Boisbriand, QC, Canada）的横截面半径，这对应于大约50 mm的表面到表面间隙。这项工作展示了将分析建模与机器学习相结合以提高多臂机器人系统精度和可靠性的有效性。

英文摘要

This study presents an integrated framework for enhancing the safety and operational efficiency of robotic arms in laparoscopic surgery by addressing minimum distance estimation between multi-arm manipulators and the associated collision-aware warning. By combining analytical modeling, real time simulation, and machine learning, the framework offers a robust solution for ensuring safe robotic operations. An analytical model was developed to estimate the minimum distances between robotic arms based on their joint configurations, offering theoretical calculations that serve as both a validation tool and a benchmark. To complement this, a 3D simulation environment was created to model two 7 DOF Kinova robotic arms (Kinova inc., Boisbriand, QC, Canada), generating a diverse dataset of configurations for distance estimation and collision warning. Using these insights, a deep residual neural network model was trained with joint configurations as inputs. On the held out validation set, the model achieves R2 = 0.940, RMSE = 42.0 mm, MAE = 28.7 mm, and a near zero mean bias, demonstrating strong predictive accuracy and consistent generalization across the workspace. The framework is intended as an early collision warning layer, where a warning is triggered when the predicted inter-arm distance falls below a 0.2 m threshold, which corresponds to a surface to surface clearance of approximately 50 mm given the Kinova Gen3 (Kinova inc., Boisbriand, QC, Canada) cross sectional radius. This work demonstrates the effectiveness of combining analytical modeling with machine learning to enhance the precision and reliability of multi-arm robotic systems.

URL PDF HTML ☆

赞 0 踩 0

2509.03122 2026-06-19 cs.CL cs.AI cs.LG 版本更新

From Construction to Injection: Edit-Based Fingerprints for Large Language Models

从构建到注入：面向大型语言模型的基于编辑的指纹

Yue Li, Xin Yi, Dongsheng Shi, Yongyi Cui, Gerard de Melo, Linlin Wang

发表机构 * East China Normal University（华东师范大学）； Hasso Plattner Institute/University of Potsdam（哈索罗普拉特纳研究所/波茨坦大学）

AI总结提出端到端注入指纹框架，通过代码混合指纹和多候选编辑方法，解决黑盒部署中指纹的不可感知性和鲁棒性挑战。

Comments preprint

详情

AI中文摘要

可靠的模型指纹对于保护大型语言模型（LLMs）免受未经授权的重新分发和商业滥用至关重要。在黑盒部署中，验证受到对可疑指纹查询的防御性过滤以及可能削弱嵌入所有权证据的下游模型修改的阻碍。这些风险要求指纹在构建和注入方面都具有鲁棒性。在构建方面，先前的范式面临不可感知性的权衡：自然语言指纹可能被意外激活，而乱码指纹在统计上暴露且更容易被过滤。在注入方面，现有方法难以在模型修改下保持持久的触发-目标行为。我们提出了一个端到端的注入指纹框架来解决这些挑战。代码混合指纹（CF）在高复杂度约束下使用最低困惑度的代码混合来缓解这种双向不可感知性权衡。多候选编辑（MCEdit）构建结构冗余、间隔分离的触发-目标映射，以在模型修改下实现优雅降级。在不可感知性、可检测性和无害性方面的广泛评估表明，该框架在几乎不影响实用性的情况下实现了鲁棒的所有权验证。

英文摘要

Reliable model fingerprints are essential for protecting large language models (LLMs) against unauthorized redistribution and commercial misuse. In black-box deployment, verification is hindered by defensive filtering of suspected fingerprint queries, as well as by downstream model modifications that may weaken embedded ownership evidence. These risks require fingerprints to be robust in both construction and injection. For construction, prior paradigms face an imperceptibility trade-off: natural-language fingerprints may be accidentally activated, whereas garbled fingerprints are statistically exposed and easier to filter. For injection, existing methods struggle to preserve persistent trigger--target behaviors under model modification. We propose an end-to-end injected fingerprinting framework to address these challenges. Code-mixing Fingerprints (CF) use lowest-perplexity code-mixing under a high-complexity constraint to mitigate this two-sided imperceptibility trade-off. Multi-Candidate Editing (MCEdit) constructs structurally redundant, margin-separated trigger--target mappings to enable graceful degradation under model modification. Extensive evaluations on imperceptibility, detectability, and harmlessness demonstrate robust ownership verification with negligible impact on utility.

URL PDF HTML ☆

赞 0 踩 0

2510.01565 2026-06-19 cs.LG cs.DC 版本更新

TetriServe: Efficiently Serving Mixed DiT Workloads

TetriServe: 高效服务混合DiT工作负载

Runyu Lu, Shiqi He, Wenxuan Tan, Shenggui Li, Ruofan Wu, Jeff J. Ma, Ang Chen, Mosharaf Chowdhury

发表机构 * University of Michigan（密歇根大学）； University of Wisconsin-Madison（威斯康星大学麦迪逊分校）； Nanyang Technological University（南洋理工大学）

AI总结针对混合分辨率与截止时间的异构DiT工作负载，提出基于步骤级序列并行的TetriServe系统，通过轮次调度与自适应并行度，在保证图像质量下将SLO达成率提升32%。

详情

AI中文摘要

扩散Transformer（DiT）模型通过迭代去噪步骤生成高质量图像，但由于其高计算成本（尤其在大分辨率下），在严格服务级别目标（SLO）下服务这些模型具有挑战性。现有服务系统使用固定程度的序列并行，这对于具有混合分辨率和截止时间的异构工作负载效率低下，导致GPU利用率低和SLO达成率低。在本文中，我们提出步骤级序列并行，根据请求的截止时间动态调整单个请求的并行度。我们提出了TetriServe，一个实现此策略的DiT服务系统，用于高效图像生成。具体来说，TetriServe引入了一种新颖的基于轮次的调度机制，通过（1）将时间离散化为固定轮次以使截止时间感知调度可处理，（2）在步骤级别自适应并行度并最小化GPU小时消耗，以及（3）联合打包请求以最小化延迟完成，从而提高SLO达成率。对最先进的DiT模型进行的广泛评估表明，与现有解决方案相比，TetriServe在不降低图像质量的情况下实现了高达32%的SLO达成率提升。

英文摘要

Diffusion Transformer (DiT) models excel at generating high-quality images through iterative denoising steps, but serving them under strict Service Level Objectives (SLOs) is challenging due to their high computational cost, particularly at larger resolutions. Existing serving systems use fixed-degree sequence parallelism, which is inefficient for heterogeneous workloads with mixed resolutions and deadlines, leading to poor GPU utilization and low SLO attainment. In this paper, we propose step-level sequence parallelism to dynamically adjust the degree of parallelism of individual requests according to their deadlines. We present TetriServe, a DiT serving system that implements this strategy for highly efficient image generation. Specifically, TetriServe introduces a novel round-based scheduling mechanism that improves SLO attainment by (1) discretizing time into fixed rounds to make deadline-aware scheduling tractable, (2) adapting parallelism at the step level and minimizing GPU hour consumption, and (3) jointly packing requests to minimize late completions. Extensive evaluation on state-of-the-art DiT models shows that TetriServe achieves up to 32% higher SLO attainment compared to existing solutions without degrading image quality.

URL PDF HTML ☆

赞 0 踩 0

2508.02604 2026-06-19 cs.RO cs.SY eess.SY 版本更新

在线镜像下降中近似的隐藏代价

Ofir Schlisselberg, Uri Sherman, Tomer Koren, Yishay Mansour

发表机构 * Tel Aviv University（特拉维夫大学）； Google Research（谷歌研究）

AI总结研究在线镜像下降（OMD）在近似误差下的鲁棒性，发现正则子光滑度与误差容忍度密切相关：均匀光滑正则子有紧界，而负熵在单纯形上需指数小误差，对数障碍和Tsallis正则子仅需多项式误差。

详情

AI中文摘要

在线镜像下降（OMD）是一个基本的算法范式，支撑着优化、机器学习和序列决策中的许多算法。OMD迭代被定义为优化子问题的解，而这些子问题通常只能近似求解，导致算法的不精确版本。然而，现有的OMD分析通常假设理想的无误差环境，从而限制了我们对实践中应期望的性能保证的理解。在这项工作中，我们启动了对不精确OMD的系统研究，并揭示了正则子光滑性与对近似误差鲁棒性之间的复杂关系。当正则子一致光滑时，我们建立了由误差引起的超额遗憾的紧界。然后，对于单纯形及其子集上的障碍正则子，我们识别出一个尖锐的分离：负熵需要指数小的误差以避免线性遗憾，而对数障碍和Tsallis正则子即使在误差仅为多项式大小时也能保持鲁棒。最后，我们表明当损失是随机的且域是单纯形时，负熵重新获得鲁棒性——但这种性质并不扩展到所有子集，在那里指数小的误差再次是避免次优遗憾所必需的。

英文摘要

Online mirror descent (OMD) is a fundamental algorithmic paradigm that underlies many algorithms in optimization, machine learning and sequential decision-making. The OMD iterates are defined as solutions to optimization subproblems which, oftentimes, can be solved only approximately, leading to an inexact version of the algorithm. Nonetheless, existing OMD analyses typically assume an idealized error free setting, thereby limiting our understanding of performance guarantees that should be expected in practice. In this work we initiate a systematic study into inexact OMD, and uncover an intricate relation between regularizer smoothness and robustness to approximation errors. When the regularizer is uniformly smooth, we establish a tight bound on the excess regret due to errors. Then, for barrier regularizers over the simplex and its subsets, we identify a sharp separation: negative entropy requires exponentially small errors to avoid linear regret, whereas log-barrier and Tsallis regularizers remain robust even when the errors are only polynomial. Finally, we show that when the losses are stochastic and the domain is the simplex, negative entropy regains robustness-but this property does not extend to all subsets, where exponentially small errors are again necessary to avoid suboptimal regret.

URL PDF HTML ☆

赞 0 踩 0

2508.04424 2026-06-19 cs.CV 版本更新

Composed Object Retrieval: Object-level Retrieval via Composed Expressions

组合对象检索：通过组合表达式进行对象级检索

Tong Wang, Guanyu Yang, Nian Liu, Zongyan Han, Jinxing Zhou, Salman Khan, Fahad Shahbaz Khan

发表机构 * Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications, Southeast University, Ministry of Education, Jiangsu, China（新一代人工智能技术及跨学科应用国家重点实验室，东南大学，教育部，江苏，中国）； Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, UAE（穆罕默德·本·扎耶德人工智能大学（MBZUAI），阿布扎赫德，阿联酋）

AI总结提出组合对象检索（COR）任务，通过组合参考对象、掩码和检索文本进行对象级检索，并构建COR125K基准和CORE模型，显著优于现有方法。

详情

AI中文摘要

基于用户意图检索细粒度视觉内容在多模态系统中仍然是一个挑战。尽管当前的组合图像检索（CIR）方法结合了参考图像和检索文本，但它们局限于图像级匹配，无法定位特定对象。为此，我们提出了组合对象检索（COR），一种新的对象级检索任务，从目标图像中的候选对象中检索目标对象，并用像素级掩码对检索结果进行定位。给定一个参考对象、其掩码、一个目标图像以及描述所需修改的检索文本，COR要求模型执行组合视觉-文本推理，而不是依赖显式的类别名称。这一设置带来了若干挑战，包括细粒度组合匹配、在视觉相似干扰物下的负对象过滤以及灵活的单对象或多对象检索。我们构建了COR125K，第一个大规模COR基准，包含408个类别的125,541个检索三元组，并划分基础/新类别以评估类别级泛化能力。我们还提出了CORE，一个统一的端到端模型，集成了参考区域编码、自适应视觉-文本交互和区域级对比学习，以将组合表示与目标对象对齐，同时抑制背景和干扰物。大量实验表明，CORE在基础和新类别上均显著优于现有的基于CIR的流程和强基线，为细粒度对象级多模态检索建立了一个简单而有效的基础。代码将在此https URL公开发布。

英文摘要

Retrieving fine-grained visual content based on user intent remains a challenge in multimodal systems. Although current Composed Image Retrieval (CIR) methods combine reference images with retrieval texts, they are constrained to image-level matching and cannot localize specific objects. To this end, we propose Composed Object Retrieval (COR), a new object-level retrieval task that retrieves target object(s) from candidate objects in a target image and grounds the retrieved result with pixel-level masks. Given a reference object, its mask, a target image, and a retrieval text describing the desired modification, COR requires models to perform composed visual-textual reasoning rather than relying on explicit category names. This setting introduces several challenges, including fine-grained compositional matching, negative-object filtering under visually similar distractors, and flexible single- or multi-object retrieval. We construct COR125K, the first large-scale COR benchmark, containing 125,541 retrieval triplets across 408 categories with base/novel splits for evaluating category-level generalization. We also present CORE, a unified end-to-end model that integrates reference region encoding, adaptive vision-text interaction, and region-level contrastive learning to align composed representations with target objects while suppressing background and distractors. Extensive experiments demonstrate that CORE significantly outperforms existing CIR-based pipelines and strong baselines in both base and novel categories, establishing a simple and effective foundation for fine-grained object-level multimodal retrieval. Code will be released publicly at https://github.com/wangtong627/COR.

URL PDF HTML ☆

赞 0 踩 0

2511.04260 2026-06-19 cs.CV cs.AI 版本更新

Proto-LeakNet: Towards Signal-Leak Aware Attribution in Synthetic Human Face Imagery

Proto-LeakNet：面向合成人脸图像中信号泄漏感知的归因方法

Claudio Giusti, Luca Guarnera, Sebastiano Battiato

发表机构 * Department of Mathematics and Computer Science（数学与计算机科学系）； University of Catania（卡塔尼亚大学）

AI总结提出Proto-LeakNet，利用扩散模型中的信号泄漏痕迹，结合闭集分类与密度开集评估，实现可解释的生成器归因，在闭集上训练后对未见生成器也有效。

Comments 44 pages, 27 figures, 11 tables

详情

DOI: 10.1016/j.cviu.2026.104848

AI中文摘要

合成图像和深度伪造生成模型的日益复杂使得源归因和真实性验证成为现代计算机视觉系统的关键挑战。最近的研究表明，扩散管道会在其输出中无意中留下持久的统计痕迹，称为信号泄漏，特别是在潜在表示中。基于这一观察，我们提出了Proto-LeakNet，一个信号泄漏感知且可解释的归因框架，它将闭集分类与基于密度的开集评估相结合，对学习到的嵌入进行开集评估，从而无需重新训练即可分析未见过的生成器。我们的方法作用于扩散模型的潜在域，重新模拟部分前向扩散以暴露残留的生成器特定线索。一个时间注意力编码器聚合多步潜在特征，而一个特征加权原型头则结构化嵌入空间并实现透明的归因。仅在闭集数据上训练并达到98.13%的宏AUC，Proto-LeakNet学习到的潜在几何结构在后处理下保持鲁棒，超越了最先进的方法，并且在真实图像与已知生成器之间以及已知与未见生成器之间实现了强可分离性。代码库可在以下链接获取：this https URL。

英文摘要

The growing sophistication of synthetic image and deepfake generation models has turned source attribution and authenticity verification into a critical challenge for modern computer vision systems. Recent studies suggest that diffusion pipelines unintentionally imprint persistent statistical traces, known as signal-leaks, within their outputs, particularly in latent representations. Building on this observation, we propose Proto-LeakNet, a signal-leak-aware and interpretable attribution framework that integrates Closed-set classification with a density-based Open-set evaluation on the learned embeddings, enabling analysis of unseen generators without retraining. Acting in the latent domain of diffusion models, our method re-simulates partial forward diffusion to expose residual generator-specific cues. A temporal attention encoder aggregates multi-step latent features, while a feature-weighted prototype head structures the embedding space and enables transparent attribution. Trained solely on closed data and achieving a Macro AUC of 98.13\%, Proto-LeakNet learns a latent geometry that remains robust under post-processing, surpassing state-of-the-art methods, and achieves strong separability both between real images and known generators, and between known and unseen ones. The codebase is available at the following link: https://github.com/claudiunderthehood/Proto-LeakNet .

URL PDF HTML ☆

赞 0 踩 0

2510.18784 2026-06-19 cs.LG 版本更新

CAGE: Curvature-Aware Gradient Estimation For Accurate Quantization-Aware Training

CAGE: 曲率感知梯度估计用于精确的量化感知训练

Soroush Tabesh, Mher Safaryan, Andrei Panferov, Alexandra Volkova, Dan Alistarh

发表机构 * Anonymous Authors（匿名作者）

AI总结提出CAGE方法，通过曲率感知校正项改进直通估计器，平衡损失最小化与量化约束，在平滑非凸设置下提供收敛保证，显著提升低比特量化感知训练的精度。

Comments Accepted at MLSys 2026 (Oral). To appear in Proceedings of Machine Learning and Systems 8

详情

Journal ref: Proceedings of Machine Learning and Systems 8 (MLSys 2026)

AI中文摘要

尽管在低比特量化感知训练（QAT）方面已有大量工作，但这些技术与原生训练之间仍存在精度差距。为解决这一问题，我们引入了CAGE（曲率感知梯度估计），一种新的QAT方法，它用曲率感知校正项增强直通估计器（STE）梯度，旨在抵消量化引起的损失增加。CAGE源自QAT的多目标视角，平衡损失最小化与量化约束，产生一个依赖于局部曲率信息的原理性校正项。在理论方面，我们引入了量化优化的帕累托最优解概念，并证明CAGE在平滑非凸设置下具有强收敛保证。在实现方面，我们的方法是优化器无关的，但我们提供了一个利用Adam统计信息的高效实现。在相似计算成本下，CAGE在精度上显著优于先前最先进的方法：对于QAT微调，它将压缩精度损失相对于先前最佳方法减半；而对于Llama模型的QAT预训练，其在3比特权重和激活（W3A3）下的精度与先前最佳方法在4比特（W4A4）下达到的精度相当。官方实现可在以下链接找到：https://github.com/IST-DASLab/CAGE。

英文摘要

Despite significant work on low-bit quantization-aware training (QAT), there is still an accuracy gap between such techniques and native training. To address this, we introduce CAGE (Curvature-Aware Gradient Estimation), a new QAT method that augments the straight-through estimator (STE) gradient with a curvature-aware correction designed to counteract the loss increase induced by quantization. CAGE is derived from a multi-objective view of QAT that balances loss minimization with the quantization constraints, yielding a principled correction term that depends on local curvature information. On the theoretical side, we introduce the notion of Pareto-optimal solutions for quantized optimization, and establish that CAGE yields strong convergence guarantees in the smooth non-convex setting. In terms of implementation, our approach is optimizer-agnostic, but we provide a highly-efficient implementation that leverages Adam statistics. CAGE significantly improves upon the prior state-of-the-art methods in terms of accuracy, for similar computational cost: for QAT fine-tuning, it halves the compression accuracy loss relative to the prior best method, while for QAT pre-training of Llama models, its accuracy for 3-bit weights-and-activations (W3A3) matches the accuracy achieved at 4-bits (W4A4) with the prior best method. The official implementation can be found over https://github.com/IST-DASLab/CAGE .

URL PDF HTML ☆

赞 0 踩 0

2507.23534 2026-06-19 cs.LG cs.CV 版本更新

Continual Learning with Support Boundary Experience Blending

支持边界经验混合的持续学习

Chih-Fan Hsu, Ming-Ching Chang, Wei-Chao Chen

发表机构 * National Taiwan University（国立台湾大学）

AI总结提出经验混合框架，通过差分隐私启发的噪声生成支持边界数据，联合训练样本和边界数据以正则化决策边界，在多个数据集上提升持续学习准确率。

详情

AI中文摘要

持续学习旨在减轻模型在顺序任务训练时的灾难性遗忘。常见方法经验回放存储过去的样本，但仅稀疏地近似数据分布，导致决策边界脆弱且过于简化。我们通过引入支持边界数据来解决这一限制，该数据通过差分隐私启发的噪声注入潜在特征，生成边界邻近表示，隐式正则化决策边界。基于此，我们提出经验混合框架，通过双模型聚合策略联合训练样本和支持边界数据。经验混合有两个组成部分：(1) 潜在空间噪声注入以生成支持边界数据，(2) 联合利用样本和支持边界数据的端到端训练。与标准经验回放不同，支持边界数据丰富了决策边界附近的特征空间，从而实现更稳定和鲁棒的持续学习。在CIFAR-10、CIFAR-100、Tiny ImageNet和ImageNet1K上的大量实验分别展示了10%、6%、13%和2%的持续准确率提升。

英文摘要

Continual learning (CL) seeks to mitigate catastrophic forgetting when models are trained with sequential tasks. A common approach, experience replay (ER), stores past exemplars but only sparsely approximates the data distribution, yielding fragile and oversimplified decision boundaries. We address this limitation by introducing Support Boundary Data (SBD), generated via differential-privacy-inspired noise into latent features to create boundary-adjacent representations that implicitly regularize decision boundaries. Building on this idea, we propose Experience Blending (EB), a framework that jointly trains on exemplars and SBD through a dual-model aggregation strategy. EB has two components: (1) latent-space noise injection to generate support boundary data, and (2) end-to-end training that jointly leverages exemplars and SBD. Unlike standard experience replay, SBD enriches the feature space near decision boundaries, leading to more stable and robust continual learning. Extensive experiments on CIFAR-10, CIFAR-100, Tiny ImageNet, and ImageNet1K demonstrate consistent accuracy improvements of 10%, 6%, 14%, 2%, respectively.

URL PDF HTML ☆

赞 0 踩 0

2510.27285 2026-06-19 cs.CV cs.CR 版本更新

Rethinking Robust Adversarial Concept Erasure in Diffusion Models

重新思考扩散模型中的鲁棒对抗性概念擦除

Qinghong Yin, Yu Tian, Heming Yang, Xiang Chen, Xianlin Zhang, Yue Ming, Xueming Li, Yue Zhang

发表机构 * Beijing University of Posts and Telecommunications（北京邮电大学）； Dept. of Comp. Sci. and Tech., Institute for AI, Tsinghua University（计算机科学与技术系，人工智能研究院，清华大学）； University of Chinese Academy of Sciences（中国科学院大学）； Nanjing University of Aeronautics and Astronautics（南京航空航天大学）

AI总结针对扩散模型中概念擦除的对抗训练忽视概念语义导致拟合不足的问题，提出语义引导的鲁棒对抗概念擦除方法S-GRACE，显著提升擦除性能26%并减少90%训练时间。

详情

AI中文摘要

概念擦除旨在选择性地遗忘扩散模型（DMs）中的不良内容，以降低敏感内容生成的风险。作为概念擦除的一种新范式，现有方法大多采用对抗训练来识别和抑制目标概念，从而减少敏感输出的可能性。然而，这些方法常常忽视对抗训练在DMs中的特异性，导致仅能部分缓解。在这项工作中，我们从概念空间的角度调查并量化了这种特异性，即对抗样本能否真正拟合目标概念空间？我们观察到现有方法在生成对抗样本时忽视了概念语义的作用，导致对概念空间的拟合效果不佳。这种忽视导致了以下问题：1）当对抗样本较少时，它们无法全面覆盖目标概念；2）反之，它们会破坏其他目标概念空间。受这些发现分析的启发，我们引入了S-GRACE（语义引导的鲁棒对抗概念擦除），它优雅地利用概念空间内的语义引导来生成对抗样本并执行擦除训练。使用七种最先进方法和三种对抗提示生成策略在各种DM遗忘场景下进行的实验表明，S-GRACE显著提高了擦除性能26%，更好地保留了非目标概念，并将训练时间减少了90%。我们的代码可在此https URL获取。

英文摘要

Concept erasure aims to selectively unlearning undesirable content in diffusion models (DMs) to reduce the risk of sensitive content generation. As a novel paradigm in concept erasure, most existing methods employ adversarial training to identify and suppress target concepts, thus reducing the likelihood of sensitive outputs. However, these methods often neglect the specificity of adversarial training in DMs, resulting in only partial mitigation. In this work, we investigate and quantify this specificity from the perspective of concept space, i.e., can adversarial samples truly fit the target concept space? We observe that existing methods neglect the role of conceptual semantics when generating adversarial samples, resulting in ineffective fitting of concept spaces. This oversight leads to the following issues: 1) when there are few adversarial samples, they fail to comprehensively cover the object concept; 2) conversely, they will disrupt other target concept spaces. Motivated by the analysis of these findings, we introduce S-GRACE (Semantics-Guided Robust Adversarial Concept Erasure), which grace leveraging semantic guidance within the concept space to generate adversarial samples and perform erasure training. Experiments conducted with seven state-of-the-art methods and three adversarial prompt generation strategies across various DM unlearning scenarios demonstrate that S-GRACE significantly improves erasure performance 26%, better preserves non-target concepts, and reduces training time by 90%. Our code is available at https://github.com/Qhong-522/S-GRACE.

URL PDF HTML ☆

赞 0 踩 0

2511.04514 2026-06-19 cs.LG 版本更新

Linear Mode Connectivity under Data Shifts for Deep Ensembles of Image Classifiers

图像分类器深度集成在数据偏移下的线性模式连通性

C. Hepburn, T. Zielke, A. P. Raulf

发表机构 * Institute for AI Safety & Security（人工智能安全与安全研究所）

AI总结实验研究数据偏移下线性模式连通性（LMC）的条件，发现小学习率和大批量可减轻其影响，并揭示LMC在训练效率与集成多样性间的权衡。

Comments 17 pages, 22 figures

详情

AI中文摘要

线性模式连通性（LMC）现象将深度学习的多个方面联系起来，包括噪声随机梯度下的训练稳定性、局部最小值（盆地）的平滑性和泛化性、采样模型的相似性和功能多样性，以及架构对数据处理的影响。在这项工作中，我们实验研究了数据偏移下的LMC，并确定了减轻其影响的条件。我们将数据偏移解释为随机梯度噪声的额外来源，可以通过小学习率和大批量来减少。这些参数影响模型是收敛到相同的局部最小值，还是收敛到损失景观中具有不同平滑性和泛化性的区域。尽管通过LMC采样的模型往往比收敛到不同盆地的模型更频繁地犯相似错误，但LMC的好处在于平衡训练效率与从更大、更多样化的集成中获得的收益。代码和补充材料可从此https URL获取。本工作已提交给IEEE考虑发表。版权可能随时转移，此后此版本可能不再可访问。

英文摘要

The phenomenon of linear mode connectivity (LMC) links several aspects of deep learning, including training stability under noisy stochastic gradients, the smoothness and generalization of local minima (basins), the similarity and functional diversity of sampled models, and architectural effects on data processing. In this work, we experimentally study LMC under data shifts and identify conditions that mitigate their impact. We interpret data shifts as an additional source of stochastic gradient noise, which can be reduced through small learning rates and large batch sizes. These parameters influence whether models converge to the same local minimum or to regions of the loss landscape with varying smoothness and generalization. Although models sampled via LMC tend to make similar errors more frequently than those converging to different basins, the benefit of LMC lies in balancing training efficiency against the gains achieved from larger, more diverse ensembles. Code and supplementary materials are available at https://github.com/DLR-KI/LMC. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.

URL PDF HTML ☆

赞 0 踩 0

2510.24399 2026-06-19 cs.CV cs.RO 版本更新

GenTrack: A New Generation of Multi-Object Tracking

GenTrack：新一代多目标跟踪

Toan Van Nguyen, Rasmus G. K. Christiansen, Dirk Kraft, Leon Bodenhagen

发表机构 * SDU Robotics, University of Southern Denmark（SDU机器人实验室，南丹麦大学）

AI总结提出GenTrack多目标跟踪方法，采用随机与确定性混合策略，结合粒子群优化与社会交互，在弱检测器、遮挡等场景下有效维持目标身份一致性并减少ID切换。

Comments This work has been submitted to the IEEE for possible publication

详情

AI中文摘要

本文介绍了一种新颖的多目标跟踪（MOT）方法，称为GenTrack，其主要贡献包括：第一，一种混合跟踪方法，采用随机和确定性方式，以鲁棒地处理未知且时变的目标数量，特别是在维持目标身份（ID）一致性和管理非线性动态方面；第二，利用粒子群优化（PSO）和一些提出的适应度度量，引导随机粒子朝向其目标分布模式，从而即使在弱且噪声大的目标检测器下也能实现有效跟踪；第三，整合目标间的社会交互，以增强PSO引导的粒子，并改进强（匹配）和弱（未匹配）轨迹的连续更新，从而减少ID切换和轨迹丢失，尤其是在遮挡期间；第四，基于GenTrack重新定义的视觉MOT基线，结合了基于空间一致性、外观、检测置信度、轨迹惩罚和社会分数的综合状态与观测模型，以实现系统且高效的目标更新；第五，首个公开可用的最小依赖源代码参考实现，包含三种变体，包括GenTrack Simple、Strengthen和Super，便于灵活重新实现。实验结果表明，与最先进的跟踪器相比，GenTrack在标准基准和现实场景中提供了优越的性能，并集成了基线实现以进行公平比较。还讨论了未来工作的潜在方向。所提方法和比较跟踪器的源代码参考实现已在GitHub上提供：this https URL

英文摘要

This paper introduces a novel multi-object tracking (MOT) method, dubbed GenTrack, whose main contributions include: first-a hybrid tracking approach employing both stochastic and deterministic manners to robustly handle unknown and time-varying numbers of targets, particularly in maintaining target identity (ID) consistency and managing nonlinear dynamics, second-leveraging particle swarm optimization (PSO) with some proposed fitness measures to guide stochastic particles toward their target distribution modes, enabling effective tracking even with weak and noisy object detectors, third-integration of social interactions among targets to enhance PSO-guided particles as well as improve continuous updates of both strong (matched) and weak (unmatched) tracks, thereby reducing ID switches and track loss, especially during occlusions, fourth-a GenTrack-based redefined visual MOT baseline incorporating a comprehensive state and observation model based on space consistency, appearance, detection confidence, track penalties, and social scores for systematic and efficient target updates, and five-the first ever publicly available source-code reference implementation with minimal dependencies, featuring three variants, including GenTrack Simple, Strengthen, and Super, facilitating flexible reimplementation. Experimental results have shown that GenTrack provides superior performance on standard benchmarks and real-world scenarios compared to state-of-the-art trackers, with integrated implementations of baselines for fair comparison. Potential directions for future work are also discussed. The source-code reference implementations of both the proposed method and compared-trackers are provided on GitHub: https://github.com/SDU-VelKoTek/GenTrack

URL PDF HTML ☆

赞 0 踩 0