arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2085
2602.01372 2026-05-11 math.OC cs.LG

Robust Sublinear Convergence Rates for Iterative Bregman Projections

Gabriel Peyré

AI总结 本文研究了在熵正则化框架下迭代Bregman投影方法的收敛速率问题,提出了一种通用的分析框架,证明了其对偶收敛速率为 $O(1/k)$,且常数项仅与熵正则化参数 $γ$ 线性相关,因而称为“鲁棒”收敛速率。该方法通过构造约束分割诱导的商范数下的原问题和对偶问题界,结合非扩张性分析,简化了收敛性证明。文章还基于该框架提出了一个新的图结构上的流-Sinkhorn算法,用于计算图上的Wasserstein-1距离,并给出了其计算复杂度的理论保证。

详情
英文摘要

Entropic regularization provides a simple way to approximate linear programs whose constraints split into two or more tractable blocks. The resulting objectives are amenable to cyclic Kullback-Leibler (KL) Bregman projections, with Sinkhorn-type algorithms for optimal transport, matrix scaling, and barycenters as canonical examples. This paper gives a general blueprint for proving $O(1/k)$ dual convergence rate with a constant that scales only linearly in $1/γ$, where $γ$ is the entropic regularization parameter. We call such rates "robust", because this mild dependence on $γ$ underpins favorable complexity bounds for approximating the unregularized problem via alternating KL projections. The blueprint reduces the proof to a uniform primal bound and a dual bound for a quotient norm induced by the constraint split. To make these inputs usable, we propose two helper results, which rely on the non-expansiveness of the dual iterations in this quotient dual norm. Instantiating this blueprint for graph-structured transport yields a new flow-Sinkhorn algorithm for the Wasserstein-1 distance on graphs. It achieves $\varepsilon$-additive accuracy on the transshipment cost in $O(p\,\mathrm{diameter}^3/\varepsilon^{4})$ arithmetic operations (up to logarithmic factors), where $p$ is the number of edges. We also provide a machine-checked Lean formalization of the core blueprint and its graph-$\mathrm{W}_1$ instantiation.

2601.22400 2026-05-11 quant-ph cs.AI

Spectral Filtering for Complex Linear Dynamical Systems

Elad Hazan, Annie Marsden

AI总结 本文研究了具有扇形有界谱的复值线性动态系统(CLDS)的学习问题,这类系统广泛存在于信号处理、结构状态空间模型和量子系统中。作者提出了一种基于Slepian基的谱滤波方法,证明了系统的可学习性由一个与状态空间维度无关的有效维度所决定。该方法进一步推导出适用于CLDS序列预测的维度无关的遗憾界,为复杂动态系统的高效学习提供了理论保证。

详情
英文摘要

We study the problem of learning complex-valued linear dynamical systems (CLDS) with sector-bounded spectrum. This class captures oscillatory and long-memory dynamics arising in signal processing, structured state space models, and quantum systems. We introduce a spectral filtering method based on the Slepian basis and show that learnability is governed by an effective dimension independent of the ambient state dimension. As a consequence, we obtain dimension-free regret bounds for sequence prediction in CLDS with spectrum contained in a sector of the unit disk.

2601.21951 2026-05-11 stat.ML cs.LG stat.CO

Diffusion Path Samplers via Sequential Monte Carlo

James Matthew Young, Paula Cordero-Encinar, Sebastian Reich, Andrew Duncan, O. Deniz Akyildiz

AI总结 本文提出了一种基于扩散路径的采样方法,用于从仅知归一化常数的目标分布中进行采样。研究通过构建一条从简单基础分布到目标分布的扩散路径,并结合序贯蒙特卡洛方法,高效估计时间变化分布的得分函数和密度函数。为降低得分估计的方差,作者还设计了实用的控制变量调度策略,并将该框架应用于多种扩散路径模型,理论分析与实验结果均验证了方法的有效性。

详情
英文摘要

We develop diffusion-based samplers for target distributions known up to a normalising constant. To this end, we rely on the well-known diffusion path that smoothly interpolates between a simple base distribution and the target, popularised by diffusion models. We tackle the score estimation problem by developing an efficient sequential Monte Carlo sampler that evolves auxiliary variables from conditional distributions along the path, providing principled score and density estimates for time-varying distributions. To control the variance of score estimates, we further propose practical control variate schedules that incur minimal overhead. We adapt this general framework to paths induced by the Ornstein-Uhlenbeck (OU) time-reversal process, stochastic interpolants, and diffusion annealed Langevin dynamics, outlining their trade-offs. Finally, we provide theoretical guarantees and empirically demonstrate the effectiveness of our method on several synthetic and real-world datasets.

2601.07247 2026-05-11 stat.ML cs.LG math.ST stat.ME stat.TH

Multi-environment Invariance Learning with Missing Data

Yiran Jia, Jelena Bradic

AI总结 本文研究了在存在缺失数据的情况下如何进行多环境不变性学习,以提升模型的因果解释能力和预测鲁棒性。作者提出了一种基于不变性目标的估计方法,并建立了变量选择性质和$\ell_2$误差收敛率的非渐近理论保证,分析了缺失数据比例和插补模型质量对性能的影响。实验表明,即使在使用有偏插补模型的情况下,该方法仍能有效降低预测误差,展现出良好的实用价值。

Comments Added co-author

详情
英文摘要

Learning models that can handle distribution shifts is a key challenge in domain generalization. Invariance learning, an approach that focuses on identifying features invariant across environments, improves model generalization by capturing stable relationships, which may represent causal effects when the data distribution is encoded within a structural equation model (SEM) and satisfies modularity conditions. This has led to a growing body of work that builds on invariance learning, leveraging the inherent heterogeneity across environments to develop methods that provide causal explanations while enhancing robust prediction. However, in many practical scenarios, obtaining complete outcome data from each environment is challenging due to the high cost or complexity of data collection. This limitation in available data hinders the development of models that fully leverage environmental heterogeneity, making it crucial to address missing outcomes to improve both causal insights and robust prediction. In this work, we derive an estimator from the invariance objective under missing outcomes. We establish non-asymptotic guarantees on variable selection property and $\ell_2$ error convergence rates, which are influenced by the proportion of missing data and the quality of imputation models across environments. We evaluate the performance of the new estimator through extensive simulations and demonstrate its application using the UCI Bike Sharing dataset to predict the count of bike rentals. The results show that despite relying on a biased imputation model, the estimator is efficient and achieves lower prediction error, provided the bias is within a reasonable range.

2512.19408 2026-05-11 math.NA cs.CE cs.NA cs.RO cs.SY eess.SY math.DS

Mixed formulation and structure-preserving discretization of Cosserat rod dynamics in a port-Hamiltonian framework

Philipp L. Kinon, Simon R. Eugster, Peter Betsch

AI总结 本文提出了一种基于能量的非线性空间Cosserat杆动力学建模框架,适用于大位移和大旋转情况。该方法采用混合变量形式,独立处理位移、速度和应力变量,并通过引入方向量描述有限旋转,避免了奇点并保持质量矩阵恒定,最终形成一个具有二次能量泛函的无限维端口哈密顿系统。通过结构保持的有限元离散化,得到具有哈密顿结构的有限维系统,有利于设计能量-动量一致的积分方案,并自然地集成阻尼材料行为和非标准驱动方式,为计算力学中涉及有限旋转的问题提供了新的能量-动量一致建模方法。

Comments 39 pages, 16 figures

详情
英文摘要

An energy-based modeling framework for the nonlinear dynamics of spatial Cosserat rods undergoing large displacements and rotations is proposed. The mixed formulation features independent displacement, velocity and stress variables and is further objective and locking-free. Finite rotations are represented using a director formulation that avoids singularities and yields a constant mass matrix. This results in an infinite-dimensional nonlinear port-Hamiltonian (PH) system governed by partial differential-algebraic equations with a quadratic energy functional. Using a time-differentiated compliance form of the stress-strain relations allows for the imposition of kinematic constraints, such as inextensibility or shear-rigidity. A structure-preserving finite element discretization leads to a finite-dimensional system with PH structure, thus facilitating the design of an energy-momentum consistent integration scheme. Dissipative material behavior (via the generalized-Maxwell model) and non-standard actuation approaches (via pneumatic chambers or tendons) integrate naturally into the framework. As illustrated by selected numerical examples, the present framework establishes a new approach to energy-momentum consistent formulations in computational mechanics involving finite rotations.

2512.14018 2026-05-11 cs.SE cs.AI

PerfCoder: Large Language Models for Interpretable Code Performance Optimization

Jiuding Yang, Shengyao Lu, Hongxuan Liu, Shayan Shirahmad Gale Bagi, Zahra Fazel, Tomasz Czajkowski, Di Niu

AI总结 PerfCoder 是一种专门用于生成高性能代码的大语言模型,旨在解决当前模型在代码性能优化方面能力不足的问题。该模型通过可解释的定制化优化策略,结合真实优化轨迹和人类注释进行微调,并利用运行时测量进行强化学习对齐,从而直接提出并应用针对性的性能改进方案。实验表明,PerfCoder 在代码性能基准 PIE 上显著优于现有模型,同时还能生成可解释的代码反馈,提升大模型在代码优化任务中的表现。

详情
英文摘要

Large language models (LLMs) have achieved remarkable progress in automatic code generation, yet their ability to produce high-performance code remains limited--a critical requirement in real-world software systems. We argue that current LLMs struggle not only due to data scarcity but, more importantly, because they lack supervision that guides interpretable and effective performance improvements. In this work, we introduce PerfCoder, a family of LLMs specifically designed to generate performance-enhanced code from source code via interpretable, customized optimizations. PerfCoder is fine-tuned on a curated collection of real-world optimization trajectories with human-readable annotations, and preference-aligned by reinforcement fine-tuning using runtime measurements, enabling it to propose input-specific improvement strategies and apply them directly without relying on iterative refinement. On the PIE code performance benchmark, PerfCoder surpasses all existing models in both runtime speedup and effective optimization rate, demonstrating that performance optimization cannot be achieved by scale alone but requires optimization stratetgy awareness. In addition, PerfCoder can generate interpretable feedback about the source code, which, when provided as input to a larger LLM in a planner-and-optimizer cooperative workflow, can further improve outcomes. Specifically, we elevate the performance of 32B models and GPT-5 to new levels on code optimization, substantially surpassing their original performance.

2512.05967 2026-05-11 cs.IR cs.AI cs.CL cs.LG

Enhancing Retrieval-Augmented Generation with Entity Linking for Educational Platforms

Francesco Granata, Francesco Poggi, Misael Mongiovì

AI总结 在大型语言模型时代,检索增强生成(RAG)架构因其能基于可靠知识源生成文本而受到关注,但在专业领域中,仅依赖语义相似性的RAG系统常因术语歧义影响检索准确性。本文提出ELERAG,一种结合实体链接技术的增强型RAG架构,旨在提升教育问答系统的事实准确性,特别是在意大利语环境下。通过引入基于Wikidata的实体链接模块和混合重排序策略,实验表明ELERAG在专业领域数据集上显著优于传统方法,验证了领域适配的混合策略在提升教育类RAG系统事实精度中的有效性。

详情
Journal ref
Big Data and Cognitive Computing, 10(4), 120. 2026
英文摘要

In the era of Large Language Models (LLMs), Retrieval-Augmented Generation (RAG) architectures are gaining significant attention for their ability to ground language generation in reliable knowledge sources. Despite their effectiveness, RAG systems based solely on semantic similarity often fail to ensure factual accuracy in specialized domains, where terminological ambiguity can affect retrieval relevance. This study proposes ELERAG, an enhanced RAG architecture that integrates a factual signal derived from Entity Linking to improve the accuracy of educational question-answering systems in Italian. The system includes a Wikidata-based Entity Linking module and implements a hybrid re-ranking strategy based on Reciprocal Rank Fusion (RRF). To validate our approach, we compared it against standard baselines and state-of-the-art methods, including a Weighted-Score Re-ranking, a standalone Cross-Encoder and a combined RRF+Cross-Encoder pipeline. Experiments were conducted on two benchmarks: a custom academic dataset and the standard SQuAD-it dataset. Results show that, in domain-specific contexts, ELERAG significantly outperforms both the baseline and the Cross-Encoder configurations. Conversely, the Cross-Encoder approaches achieve the best results on the general-domain dataset. These findings provide strong experimental evidence of the domain mismatch effect, highlighting the importance of domain-adapted hybrid strategies to enhance factual precision in educational RAG systems without relying on computationally expensive models trained on disparate data distributions. They also demonstrate the potential of entity-aware RAG systems in educational environments, fostering adaptive and reliable AI-based tutoring tools.

2510.22944 2026-05-11 cs.CR cs.AI

Is Your Prompt Poisoning Code? Defect Induction Rates and Security Mitigation Strategies

Bin Wang, YiLu Zhong, MiDi Wan, WenJie Yu, YuanBing Ouyang, Yenan Huang, Hui Li

AI总结 本文研究了良性但表述不佳的提示对大型语言模型生成代码安全性的影响,提出了一个包含目标清晰度、信息完整性和逻辑一致性的提示质量评估框架,并构建了CWE-BENCH-PYTHON基准数据集。实验表明,提示规范性越低,生成的代码越不安全,而使用思维链和自我修正等高级提示技术可显著提升代码安全性。该研究强调提升用户提示质量是增强AI生成代码安全性的关键策略。

Comments Accepted for publication in Empirical Software Engineering (EMSE) Journal

详情
英文摘要

Large language models (LLMs) have become indispensable for automated code generation, yet the quality and security of their outputs remain a critical concern. Existing studies predominantly concentrate on adversarial attacks or inherent flaws within the models. However, a more prevalent yet underexplored issue concerns how the quality of a benign but poorly formulated prompt affects the security of the generated code. To investigate this, we first propose an evaluation framework for prompt quality encompassing three key dimensions: goal clarity, information completeness, and logical consistency. Based on this framework, we construct and publicly release CWE-BENCH-PYTHON, a large-scale benchmark dataset containing tasks with prompts categorized into four distinct levels of normativity (L0-L3). Extensive experiments on multiple state-of-the-art LLMs reveal a clear correlation: as prompt normativity decreases, the likelihood of generating insecure code consistently and markedly increases. Furthermore, we demonstrate that advanced prompting techniques, such as Chain-of-Thought and Self-Correction, effectively mitigate the security risks introduced by low-quality prompts, substantially improving code safety. Our findings highlight that enhancing the quality of user prompts constitutes a critical and effective strategy for strengthening the security of AI-generated code.

2510.00322 2026-05-11 cs.CR cs.CC cs.DS cs.LG

Privately Estimating Black-Box Statistics

Günter F. Steinke, Thomas Steinke

AI总结 本文研究如何在差分隐私框架下对任意黑盒函数进行统计估计。传统方法依赖于对估计器灵敏度的已知界限,而这些界限往往难以获取或较大。为此,作者提出了一种在数据统计效率和函数调用效率之间进行权衡的方案,并给出了该方法的近似最优性下界。

详情
英文摘要

Standard techniques for differentially private estimation, such as Laplace or Gaussian noise addition, require guaranteed bounds on the sensitivity of the estimator in question. But such sensitivity bounds are often large or simply unknown. Thus we seek differentially private methods that can be applied to arbitrary black-box functions. A handful of such techniques exist, but all are either inefficient in their use of data or require evaluating the function on exponentially many inputs. In this work we present a scheme that trades off between statistical efficiency (i.e., how much data is needed) and oracle efficiency (i.e., the number of evaluations). We also present lower bounds showing the near-optimality of our scheme.

2509.08350 2026-05-11 physics.soc-ph cs.LG math.AT

Chordless cycle filtrations for dimensionality detection in complex networks via topological data analysis

Aina Ferrà Marcús, Robert Jankowski, Meritxell Vila Miñana, Carles Casacuberta, M. Ángeles Serrano

AI总结 本文研究了如何通过拓扑数据分析揭示复杂网络中潜在的超球几何结构,并估计其维度。作者提出了一种基于无弦环(chordless cycle)的拓扑权重方案,结合代数拓扑和机器学习方法,构建了一个无需重新训练即可应用于真实网络的神经网络模型。该方法为复杂网络的隐藏几何结构揭示和低维嵌入提供了稳健有效的解决方案。

详情
英文摘要

Many complex networks, ranging from social to biological systems, exhibit structural patterns consistent with an underlying hyperbolic geometry. Revealing the dimensionality of this latent space can disentangle the structural complexity of communities, impact efficient network navigation, and fundamentally shape connectivity and system behavior. We introduce a topological data analysis weighting scheme for graphs based on chordless cycles to estimate network dimensionality in a data-driven way. We further show that the resulting descriptors can effectively estimate network dimensionality using a neural network architecture trained on a synthetic graph database constructed for this purpose, which requires no retraining to transfer effectively to real-world networks. Thus, by combining cycle-aware filtrations, algebraic topology, and machine learning, our approach provides a robust and effective method for uncovering the hidden geometry of complex networks and guiding accurate modeling and low-dimensional embedding.

2508.10880 2026-05-11 cs.CR cs.AI cs.CL

Searching for Privacy Risks in LLM Agents via Simulation

Yanzhe Zhang, Diyi Yang

AI总结 本文研究了基于大语言模型(LLM)的智能体在多轮交互中可能引发的隐私风险问题,提出了一种基于搜索的框架,通过模拟隐私关键的智能体交互过程,交替优化攻击与防御策略。该方法利用LLM作为优化器,迭代生成新的智能体指令,并通过多线程并行搜索与跨线程传播提高策略探索效率。研究发现,攻击策略从直接请求演变为复杂的伪装和伪造授权等手段,防御策略则从简单的规则限制发展为更强大的身份验证状态机,且所发现的攻击与防御策略具有跨场景和跨模型的泛化能力,为构建隐私感知的智能体提供了重要参考。

Comments ICLR 2026

详情
英文摘要

The widespread deployment of LLM-based agents is likely to introduce a critical privacy threat: malicious agents that proactively engage others in multi-turn interactions to extract sensitive information. However, the evolving nature of such dynamic dialogues makes it challenging to anticipate emerging vulnerabilities and design effective defenses. To tackle this problem, we present a search-based framework that alternates between improving attack and defense strategies through the simulation of privacy-critical agent interactions. Specifically, we employ LLMs as optimizers to analyze simulation trajectories and iteratively propose new agent instructions. To explore the strategy space more efficiently, we further utilize parallel search with multiple threads and cross-thread propagation. Through this process, we find that attack strategies escalate from direct requests to sophisticated tactics, such as impersonation and consent forgery, while defenses evolve from simple rule-based constraints to robust identity-verification state machines. The discovered attacks and defenses generalize across diverse scenarios and backbone models, providing useful insights for developing privacy-aware agents.

2508.02001 2026-05-11 cs.NI cs.LG

Versatile yet Efficient Network Traffic Analysis: Offloading Network Foundation Model to SmartNIC

Chungang Lin, Xuying Meng, Tianyu Zuo, Weiyao Zhang, Meng Shen, Ruijie Zhao, Guanming Che, Ruiqi Meng, Ziyue Huang, Haitong Luo, Zhiwei Xu, Yujun Zhang

AI总结 随着网络流量加密的普及,传统的基于大规模标注数据的流量分析方法面临挑战,而安全运维又要求在边缘进行低延迟分析。为解决灵活性与效率难以兼顾的问题,本文提出Nepco系统,将网络基础模型卸载到SmartNIC上,通过聚焦局部字节区域的高效建模和硬件友好的处理流程,实现了高灵活性与低延迟的统一。实验表明,Nepco在保持高性能的同时,将端到端延迟降低了328倍。

Comments Under review

详情
英文摘要

Pervasive encryption makes large-scale labeling infeasible for traffic analysis, while security operations demand edge analysis to avert service degradation and further vulnerabilities. These pressures have produced two disjoint research lines: 1) versatile analysis, via network foundation models for low label dependency, and 2) efficient analysis, via hardware offloading for low analysis latency. However, versatility and efficiency have appeared fundamentally incompatible to co-achieve, with prior work consistently sacrificing one for the other, yet we show that this incompatibility is a consequence of polarized design choices across the three components of traffic analysis systems, i.e., traffic processing, model architecture, and analysis execution. In response, we present Nepco, a versatile yet efficient network traffic analysis system that offloads network foundation models to SmartNIC. Our key observation is that discriminative traffic information is concentrated in localized byte regions, motivating versatile yet efficient localized byte-sequence modeling rather than inefficient global modeling. To exploit this without incurring the latency bottlenecks of complex encoding steps, we employ a hardware-friendly processing pipeline that directly embeds raw byte sequences. Crucially, to maintain versatility across diverse tasks, we propose a pattern-aware convolutional architecture equipped with dedicated scoring and gating mechanisms. By exploiting translation invariance, this design dynamically locates and extracts salient semantic signatures. We prototype Nepco on the Nvidia BlueField-3 SmartNIC with multiengine collaborative analysis execution. The experimental results demonstrate that Nepco achieves macro F1 competitive with the best performances achieved by 8 state-of-the-art network foundation models, while reducing end-to-end latency by 328x to the millisecond scale.

2506.04565 2026-05-11 cs.MA cs.CL

From Standalone LLMs to Integrated Intelligence: A Survey of Compound Al Systems

Jiayi Chen, Junyi Ye, Guiling Wang

AI总结 本文综述了复合人工智能系统(CAIS),该系统通过集成大型语言模型与检索器、代理、工具等外部组件,克服了单一模型在记忆、推理、实时 grounding 和多模态理解等方面的局限性。文章提出了基于组件角色和调度策略的多维分类体系,分析了包括检索增强生成(RAG)、LLM 代理、多模态 LLM 和调度机制在内的四种基础范式,并总结了当前系统的设计权衡与评估方法,指出了可扩展性、互操作性等关键挑战及未来研究方向。

详情
英文摘要

Compound AI Systems (CAIS) are an emerging paradigm that integrates large language models (LLMs) with external components, including retrievers, agents, tools, and orchestrators, to overcome the limitations of standalone models in tasks requiring memory, reasoning, real-time grounding, and multimodal understanding. These systems enable more capable and context-aware behaviors by composing multiple specialized modules into cohesive workflows. Despite growing adoption in both academia and industry, the CAIS landscape remains fragmented and lacks a unified framework for analysis, taxonomy, and evaluation. In this survey, we define the concept of CAIS, propose a multi-dimensional taxonomy based on component roles and orchestration strategies, and analyze four foundational paradigms: Retrieval-Augmented Generation (RAG), LLM Agents, Multimodal LLMs (MLLMs), and Orchestration. We review representative systems, compare design trade-offs, and summarize evaluation methodologies across these paradigms. Finally, we identify key challenges - including scalability, interoperability, benchmarking, and coordination - and outline promising directions for future research. This survey aims to provide researchers and practitioners with a comprehensive foundation for understanding, developing, and advancing the next generation of system-level artificial intelligence.

2505.11325 2026-05-11 stat.ME cs.AI cs.LG stat.CO stat.ML

Uncertainty Quantification for Prior-Data Fitted Networks using Martingale Posteriors

Thomas Nagler, David Rügamer

AI总结 本文研究了如何为先验-数据拟合网络(PFNs)提供不确定性量化方法,这类网络在表格数据预测任务中表现出色但缺乏对预测结果的不确定性估计。作者提出了一种基于鞅后验的采样方法,能够在无需调参的情况下高效构建预测均值、分位数等估计的贝叶斯后验,并证明了该方法的收敛性。实验表明,该方法在多个模拟和实际数据集上表现出良好的效率和校准能力。

详情
英文摘要

Prior-data fitted networks (PFNs) have emerged as promising foundation models for prediction from tabular datasets, achieving state-of-the-art performance on small to moderate data sizes without tuning. While PFNs are motivated by Bayesian ideas, they do not provide any uncertainty quantification for predictive means, quantiles, or similar quantities. We propose a principled, efficient, and tuning-free sampling procedure to construct Bayesian posteriors for such estimates based on martingale posteriors, and prove its convergence. Several simulated and real-world data examples showcase the efficiency and calibration of our method in inference applications.

2504.12922 2026-05-11 math.OC cs.LG math.LO math.PR

An abstract effective convergence theorem for stochastic processes, with applications to stochastic approximation

Morenikeji Neri, Nicholas Pischke, Thomas Powell

AI总结 本文提出了一种适用于满足弱超鞅条件的随机过程的通用收敛定理,能够在较高抽象层次上提供定量收敛保证,核心在于引入了一个通用模度函数 $τ$ 来刻画解的期望唯一性。该定理具有高度统一的收敛速率,仅依赖于少量数据。作者进一步将该结果作为统一框架,推导了包括 Robbins-Siegmund 定理、Dvoretzky 收敛定理和随机拟 Fejér 单调序列收敛在内的多个关键定理的定量版本,并探讨了其在随机逼近中的多种应用。

Comments 25 pages

详情
英文摘要

We provide a general theorem on the asymptotic behavior of stochastic processes that conform to a relaxed supermartingale condition. The distinguishing feature of our result is that it provides quantitative convergence guarantees at a much higher level of abstraction and generality than is typically seen in the stochastic approximation literature, formulated in particular in terms of a general modulus $τ$ that, on an intuitive level, captures an effective variant of the uniqueness in expectation of associated solutions. Our convergence rate is highly uniform, depending on very few data beyond $τ$. We then demonstrate the utility of our result as a unifying framework by deriving new quantitative versions of several key concepts and theorems from stochastic approximation, including the Robbins-Siegmund theorem, Dvoretzky's convergence theorem, and the convergence of stochastic quasi-Fejér monotone sequences, the latter formulated in a novel and highly general metric context. Throughout, we isolate and discuss special cases of our results which allow for the construction of fast, and in particular linear, rates. Various applications of our results and our general methodology to stochastic approximation are discussed, and in particular explicitly derived in related work of the authors.

2311.08433 2026-05-11 q-bio.QM cs.LG stat.AP

Clinical Characteristics and Laboratory Biomarkers in ICU-admitted Septic Patients with and without Bacteremia

Sangwon Baek, Seung Jun Lee

AI总结 该研究旨在探讨重症监护病房内感染性休克患者中是否存在菌血症的临床特征和实验室生物标志物的预测价值。通过回顾性分析218例患者的临床数据,研究发现C反应蛋白(CRP)和降钙素原(PCT)对菌血症具有较好的预测能力,而结合PCT、胆红素、中性粒细胞与淋巴细胞比值(NLR)、血小板、乳酸、红细胞沉降率(ESR)和格拉斯哥昏迷评分(GCS)构建的多变量逻辑回归模型显著提升了预测准确性,AUC达到0.907。研究还发现菌血症与患者死亡率存在显著关联,表明这些生物标志物在临床诊断和预后评估中具有重要应用价值。

Comments This research is not complete

详情
英文摘要

Few studies have investigated the diagnostic utilities of biomarkers for predicting bacteremia among septic patients admitted to intensive care units (ICU). Therefore, this study evaluated the prediction power of laboratory biomarkers to utilize those markers with high performance to optimize the predictive model for bacteremia. This retrospective cross-sectional study was conducted at the ICU department of Gyeongsang National University Changwon Hospital in 2019. Adult patients qualifying SEPSIS-3 (increase in sequential organ failure score greater than or equal to 2) criteria with at least two sets of blood culture were selected. Collected data was initially analyzed independently to identify the significant predictors, which was then used to build the multivariable logistic regression (MLR) model. A total of 218 patients with 48 cases of true bacteremia were analyzed in this research. Both CRP and PCT showed a substantial area under the curve (AUC) value for discriminating bacteremia among septic patients (0.757 and 0.845, respectively). To further enhance the predictive accuracy, we combined PCT, bilirubin, neutrophil lymphocyte ratio (NLR), platelets, lactic acid, erythrocyte sedimentation rate (ESR), and Glasgow Coma Scale (GCS) score to build the predictive model with an AUC of 0.907 (95% CI, 0.843 to 0.956). In addition, a high association between bacteremia and mortality rate was discovered through the survival analysis (0.004). While PCT is certainly a useful index for distinguishing patients with and without bacteremia by itself, our MLR model indicates that the accuracy of bacteremia prediction substantially improves by the combined use of PCT, bilirubin, NLR, platelets, lactic acid, ESR, and GCS score.

2309.01751 2026-05-11 eess.IV cs.CV physics.geo-ph

Multispectral Indices for Wildfire Management

Afonso Oliveira, João P. Matos-Carvalho, Filipe Moutinho, Nuno Fachada

AI总结 随着野火发生频率和强度的增加,传统地面监测方法难以应对火势和环境的快速变化,亟需更先进的管理手段。本文通过文献综述和两个实际案例研究,探讨了多光谱遥感影像在野火管理中的应用,评估了多种多光谱指数在植被、水域和人工结构等关键环境特征提取中的效果。研究发现,NVDI、MNDWI和MSR等指数在分割和特征提取方面表现突出,为提升野火监测、风险评估和应急响应提供了有效支持。

Comments The peer-reviewed version of this paper is published in Frontiers in Remote Sensing at https://doi.org/10.3389/frsen.2026.1807451. This version is typeset by the authors and differs only in pagination and typographical detail

详情
Journal ref
Frontiers in Remote Sensing, 7, 1807451, 2026
英文摘要

The increasing frequency and severity of wildfires necessitates advanced methods for effective surveillance and management, as traditional ground-based techniques often struggle to adapt to rapidly changing fire behavior and environmental conditions. This study investigates the use of multispectral aerial and satellite imagery for wildfire management through an assessment of current literature and two practical case studies. We evaluate several multispectral indices for their ability to extract environmental features critical for analyzing wildfire behavior, including vegetation, water bodies, and artificial structures. Our results highlight NVDI for vegetation, MNDWI for water features, and MSR for artificial structures as particularly effective for segmentation and feature extraction. The application of these indices enhances wildfire data processing and supports improved monitoring, risk assessment, and response strategies, demonstrating the potential of multispectral imagery to complement traditional wildfire monitoring and management approaches.

2204.05551 2026-05-11 math.OC cs.LG cs.SY eess.SY math.DS

Near-Optimal Distributed Linear-Quadratic Regulator for Networked Systems

Sungho Shin, Yiheng Lin, Guannan Qu, Adam Wierman, Mihai Anitescu

AI总结 本文研究了在分布式控制设置中,去中心化程度与控制性能之间的权衡问题。通过引入基于图结构的 $κ$-分布式控制方法,使得每个智能体仅依赖于图中距离为 $κ$ 的状态信息进行决策,从而在去中心化程度与控制性能之间建立联系。研究发现,在一定的温和条件下,$κ$-分布式控制与集中式最优控制之间的性能差距随 $κ$ 的增大呈指数级衰减,表明适度去中心化的分布式控制即可实现接近最优的控制效果,为大规模网络系统的控制提供了有效的架构方案。

详情
Journal ref
SIAM Journal on Control and Optimization, 2023
英文摘要

This paper studies the trade-off between the degree of decentralization and the performance of a distributed controller in a linear-quadratic control setting. We study a system of interconnected agents over a graph and a distributed controller, called $κ$-distributed control, which lets the agents make control decisions based on the state information within distance $κ$ on the underlying graph. This controller can tune its degree of decentralization using the parameter $κ$ and thus allows a characterization of the relationship between decentralization and performance. We show that under mild assumptions, including stabilizability, detectability, and a subexponentially growing graph condition, the performance difference between $κ$-distributed control and centralized optimal control becomes exponentially small in $κ$. This result reveals that distributed control can achieve near-optimal performance with a moderate degree of decentralization, and thus it is an effective controller architecture for large-scale networked systems.

2005.06674 2026-05-11 math.OC cs.LG math.DS

On the Convergence of Overlapping Schwarz Decomposition for Nonlinear Optimal Control

Sen Na, Sungho Shin, Mihai Anitescu, Victor M. Zavala

AI总结 本文研究了重叠施瓦茨分解算法在求解非线性最优控制问题中的收敛性质。该算法将时间域划分为多个重叠子域,并并行求解各子域上的子问题,通过更新子域边界处的对偶信息实现收敛。研究证明该算法具有局部线性收敛性,且收敛速度随重叠大小呈指数级提升,并建立了适用于二次规划的全局收敛性结果,为二阶优化算法中的施瓦茨方法提供了理论支持。实验表明,该方法在四旋翼飞行器路径规划和偏微分方程控制问题中表现出比ADMM更高的效率,且接近集中式求解器Ipopt的性能。

Comments 16 pages

详情
Journal ref
IEEE Transactions on Automatic Control, 2022
英文摘要

We study the convergence properties of an overlapping Schwarz decomposition algorithm for solving nonlinear optimal control problems (OCPs). The algorithm decomposes the time domain into a set of overlapping subdomains, and solves all subproblems defined over subdomains in parallel. The convergence is attained by updating primal-dual information at the boundaries of overlapping subdomains. We show that the algorithm exhibits local linear convergence, and that the convergence rate improves exponentially with the overlap size. We also establish global convergence results for a general quadratic programming, which enables the application of the Schwarz scheme inside second-order optimization algorithms (e.g., sequential quadratic programming). The theoretical foundation of our convergence analysis is a sensitivity result of nonlinear OCPs, which we call "exponential decay of sensitivity" (EDS). Intuitively, EDS states that the impact of perturbations at domain boundaries (i.e. initial and terminal time) on the solution decays exponentially as one moves into the domain. Here, we expand a previous analysis available in the literature by showing that EDS holds for both primal and dual solutions of nonlinear OCPs, under uniform second-order sufficient condition, controllability condition, and boundedness condition. We conduct experiments with a quadrotor motion planning problem and a PDE control problem to validate our theory; and show that the approach is significantly more efficient than ADMM and as efficient as the centralized solver Ipopt.

2605.07517 2026-05-11 cs.IR cs.AI

LARAG: Link-Aware Retrieval Strategy for RAG Systems in Hyperlinked Technical Documentation

Giorgia Bolognesi, Claudio Estatico, Ulderico Fugacci, Isabella Mastroianni, Claudio Muselli, Luca Oneto

AI总结 LARAG 是一种面向超链接技术文档的检索增强生成(RAG)方法,旨在解决传统基于嵌入的检索器忽略文档中超链接结构的问题。该方法利用技术文档中已有的超链接关系,将其作为元数据编码到文档块中,从而实现基于局部相关性的图式检索。实验表明,LARAG 在保持高质量答案生成的同时,减少了检索的文档块数量和生成的 token 数量,提升了 RAG 系统的效率和准确性。

详情
英文摘要

Retrieval-Augmented Generation (RAG) enhances the factual grounding of Large Language Models by conditioning their outputs on external documents. However, standard embedding-based retrievers treat naturally structured corpora, such as technical manuals, as flat collections of passages, thereby overlooking the hyperlink topology that users rely on when navigating such content. We introduce LARAG (Link-Aware RAG): a lightweight, link-aware retrieval strategy that leverages the author-defined hyperlink structure already present in HTML documentation, encoding hyperlink relations as metadata in the chunk representations and exploiting them to perform a form of graph-like retrieval of locally relevant content. In a benchmark of twenty expert-designed queries over Rulex Platform technical documentation and four prompting strategies, LARAG consistently improves answer quality, achieving the highest BERTScore F1, while retrieving fewer chunks and generating fewer tokens than a baseline RAG architecture used for comparison. These results show that directly leveraging the existing hyperlink topology of technical documentation, even without explicit graph construction or inference, enables an implicit form of graph-like retrieval that yields a more faithful and efficient RAG pipeline, providing better grounding at lower cost.

2605.07481 2026-05-11 cs.CR cs.AI

Vaporizer: Breaking Watermarking Schemes for Large Language Model Outputs

Jonathan Hong Jin Ng, Anh Tu Ngo, Anupam Chattopadhyay

AI总结 本文研究了当前最先进的大型语言模型(LLM)输出水印方案,并评估了它们在面对多种语义保持的文本攻击时的有效性。作者提出了多种攻击策略,包括词汇替换、机器翻译和神经重述等,通过BERT分数、文本复杂度、语法错误和阅读难度等指标衡量攻击效果。实验表明,尽管不同水印方法的抗攻击能力有所差异,但大多数水印都可以在不显著影响语义的前提下被有效移除,揭示了现有水印系统的安全漏洞与改进方向。

详情
英文摘要

In this paper, we investigate the recent state-of-the-art schemes for watermarking large language models (LLMs) outputs. These techniques are claimed to be robust, scalable and production-grade, aimed at promoting responsible usage of LLMs. We analyse the effectiveness of these watermarking techniques against an extensive collection of modified text attacks, which perform targeted semantic changes without altering the general meaning of the text content. Our approach encompasses multiple attack strategies, which include lexical alterations, machine translation, and even neural paraphrasing. The attack efficacy is measured with two target criteria - successful removal of the watermark and preservation of semantic content. We evaluate semantic preservation through BERT scores, text complexity measures, grammatical errors, and Flesch Reading Ease indices. The experimental results reveal varying levels of effectiveness among different watermarking models, with the same underlying result that it is possible to remove the watermark with reasonable effort. This study sheds light on the strengths and weaknesses of existing LLM watermarking systems, suggesting how they should be constructed to improve security of available schemes.

2605.07472 2026-05-11 cs.CR cs.AI cs.MA

HBEE: Human Behavioral Entropy Engine -- Pre-Registered Multi-Agent LLM Simulation of Peer-Suspicion-Based Detection Inversion

Vickson Ferrel

AI总结 该研究探讨了在多智能体模拟环境中,基于同行怀疑机制的内部威胁检测是否能够有效识别自适应型内鬼。研究通过预注册实验,对比了不同防御模式和对手类型下的检测效果,发现自适应型内鬼的行为反而降低了其被怀疑的程度,导致检测结果出现反转。研究还表明,传统基于行为分析的检测方法在面对自适应对手时可能失效,并公开了实验模拟器及相关数据以供进一步研究。

Comments 14 pages, 6 figures. Pre-registration document and full deviation log included in artifact

详情
英文摘要

Insider threat detection assumes that an adaptive insider leaves behavioral residue distinguishing them from legitimate users. We test this assumption against an LLM-driven adaptive insider in a controlled multi-agent simulator. Our pre-registered five-condition study isolates defender mode (cascade vs. blind UEBA) crossed with adversary type (naive vs. adaptive OPSEC) plus a no-mole control, across 100 runs (95 valid after pre-committed exclusions). The primary finding is a detection inversion: at T_60, the adaptive mole's suspicion in-degree is statistically lower than a randomly selected innocent agent (Cliff's delta = -0.694, 95% BCa CI [-0.855, -0.519], Mann-Whitney p << 0.01). The pre-registered prediction was the opposite direction. A pre-registered equivalence test (H2) shows adaptive OPSEC produces no detectable shift in the mole's UEBA rank under either defender mode. The two detection signals (peer suspicion graph in-degree and per-agent UEBA rank) decouple under adaptive adversary behavior. We bound generalization explicitly: a pre-registered Gini calibration check (H4) returns FAIL, with HBEE pairwise message-exposure Gini (0.213) diverging from the SNAP Enron reference (0.730) by |Delta Gini| = 0.52, exceeding the equivalence bound by 5x. The paper makes a narrow but surprising claim: in a controlled environment where adaptive OPSEC is implementable as an LLM directive, peer-suspicion-cascade detection inverts. We release the simulator, pre-registration document, frozen scenarios, raw telemetry, and analysis pipeline under an open-source license.

2605.07444 2026-05-11 cs.CE cs.AI

Accelerated and data-efficient flow prediction in stirred tanks via physics-informed learning

Mahdi Naderibeni, Liang Wu, David M. J. Tax

AI总结 本文研究了在工业规模搅拌罐中利用物理信息学习方法加速并提高稳态流场预测效率的问题。通过生成基于雷诺平均纳维-斯托克斯方程(RANS)的稳态流场数据集,作者对比了纯数据驱动模型与引入物理约束的隐式神经网络模型的性能,发现随着训练数据量的增加预测误差逐步下降,但在数据量达到一定规模后收益递减。引入物理约束不仅提升了模型精度和训练稳定性,还改善了示踪剂传输行为,但同时也增加了训练复杂度。

详情
英文摘要

The simulation of fluid flows is computationally expensive due to the complexity of its governing partial differential equations. Machine learning models offer a potential surrogate, enabling learning from simulations and significantly faster predictions of flow fields. However, these models require large training datasets, which introduces a trade-off between dataset generation cost and predictive accuracy. In this work, we investigate the relationship between the size of the training-set and accuracy of the prediction when learning steady flow fields in an industrial-scale stirred vessel. A data set of steady flows is generated using Reynolds Averaged Navier Stokes (RANS) simulations in a range of realistic operating conditions, including impeller speeds and liquid heights. We train implicit neural representations of flow fields and compare purely data-driven and constrained variants. Model performance is evaluated using global mean squared error (MSE), qualitative spatial comparisons of predicted and reference flow fields, and tracer transport simulations. We find that the prediction error decreases monotonically with increasing training data, but also that it exhibits clear diminishing returns beyond moderate dataset sizes. Physics-based constraints significantly improve accuracy and reduce variability across training runs in low-data regimes, and they lead to more stable tracer-transport behavior. Furthermore, reasonable interpolation can be achieved over different impeller speeds and liquid heights. However, these benefits come with an increase in the complexity of training, and their relative advantage diminishes as the training set grows.

2605.07422 2026-05-11 cs.SE cs.AI

Prompt Engineering Strategies for LLM-based Qualitative Coding of Psychological Safety in Software Engineering Communities: A Controlled Empirical Study

Moaath Alshaikh, Tasneem Alshaher, Ricardo Vieira, Beatriz Santana, Clelio Xavier, Jose Amancio, Glauco Carneiro, Julio Leite, Savio Freire, Manoel Mendonca

AI总结 本研究探讨了如何通过提示工程策略提升基于大语言模型(LLM)对软件工程社区心理安全性的定性编码效果。通过对比三种LLM在零样本和多样本封闭编码策略下的表现,发现多样本提示在提升 Claude Haiku 的编码一致性方面效果显著,而其他模型则未表现出类似提升。研究还揭示了模型间稳定性差异及系统性预测偏差,为未来基于LLM的定性分析提供了实证指导。

Comments 9 pages, 5 figures. Accepted at the 1st International Workshop on Prompt Engineering for Software Engineering (PROMPT-SE 2026), co-located with the 30th International Conference on Evaluation and Assessment in Software Engineering (EASE 2026), Glasgow, Scotland, United Kingdom, June 9--12, 2026

详情
英文摘要

Qualitative analysis plays a pivotal role in understanding the human and social aspects of software engineering. However, it remains a demanding process shaped by the subjective interpretation of individual researchers and sensitive to methodological choices such as prompt design. Recent advancements in Large Language Models (LLMs) offer promising opportunities to support this type of analysis, although their reliability in reproducing human qualitative reasoning under varying prompting conditions remains largely untested. This study presents a controlled empirical evaluation of three LLMs -- Claude Haiku, DeepSeek-Chat, and Gemini 2.5 Flash -- across two prompt engineering strategies (zero-shot and multi-shot closed coding), using Cohen's kappa as the primary agreement metric over ten independent runs per configuration. Results suggest that multi-shot prompting significantly improves agreement for Claude Haiku (Delta kappa = +0.034, Wilcoxon p = 0.004) but not for DeepSeek-Chat or Gemini 2.5 Flash. Intra-model stability varies substantially -- DeepSeek-Chat and Claude Haiku exhibit the lowest variance (SD approx. 0.017), while Gemini 2.5 Flash is the least stable (SD = 0.038). A systematic over-prediction of "Sharing Negative Feedback" is identified across all models (bias ratios up to 5.25x), alongside consistent under-prediction of "Expressing Concerns." Collectively, these findings provide empirical evidence for prompt engineering guidelines in LLM-assisted qualitative coding for software engineering research.

2605.07417 2026-05-11 cs.AR cs.LG

Effective and Memory-Efficient Alternatives to ECC for Reliable Large-Scale DNNs

Mohammad Hasan Ahmadilivani, Marten Roots, Marco Restifo, Sven-Markus Loorits, Luca Di Mauro, Jaan Raik

AI总结 随着深度学习模型在自动驾驶和数据中心等关键领域中的广泛应用,硬件故障对系统可靠性构成了严重威胁。本文研究了传统ECC在保护深度神经网络参数方面的局限性,并提出了两种高效且内存友好的替代方案——MSET和CEP。实验表明,这两种方法在不增加内存开销的情况下显著提升了大型卷积神经网络和视觉Transformer的可靠性,并在面积和延迟方面优于传统SECDED ECC方案。

Comments 7 pages, 7 figures, 3 tables. The paper is accepted at IEEE IOLTS'26

详情
英文摘要

Modern Deep Learning (DL) workloads are increasingly deployed in safety-critical domains, such as automotive systems and hyperscale data centers, where transient hardware faults pose a serious threat to system reliability. These workloads are highly memory-intensive, and their correct functionality strongly depends on model parameters stored in memory, which are typically protected using Error Correction Codes (ECCs). In this work, we study ECC's impact on such models and propose two lightweight alternatives to ECCs that achieve superior reliability. The first approach, MSET, selectively hardens the most vulnerable bits in CNN and ViT parameters, while the second approach, CEP, provides fine-grained protection for all parameter bits. Experimental results demonstrate that both methods significantly enhance the reliability of large CNNs and ViTs, mostly outperforming conventional Single Error Detection Double Error Correction (SECDED) ECC schemes, with no memory overhead and, in fact, with considerably lower area and delay characteristics when compared to SECDEC. Experimental results indicate that ViTs can be effectively protected by merely protecting their highest exponent bits in FP16 and FP32 representations. Furthermore, applying the CEP technique can guarantee the resilience of DNNs by up to one order of magnitude higher BERs, with a 3.5x lower area overhead and 7x faster decoder compared to SECDED ECC.

2605.07414 2026-05-11 cs.MA cs.AI cs.CR

OrchJail: Jailbreaking Tool-Calling Text-to-Image Agents by Orchestration-Guided Fuzzing

Jianming Chen, Yawen Wang, Junjie Wang, Zhe Liu, Qing Wang, Fanjiang Xu

AI总结 本文提出了一种名为OrchJail的工具调用型文生图代理的越狱方法,通过指导式模糊测试针对工具链的组合方式发起攻击。该方法利用高风险的工具调用模式,学习成功越狱案例中的因果关系,从而更高效地生成能够触发危险行为的提示词。实验表明,OrchJail在多个代表性模型上显著提升了越狱成功率和图像质量,同时降低了攻击成本,揭示了工具链编排作为新型安全漏洞的重要性。

详情
Journal ref
ICML 2026
英文摘要

Tool-calling text-to-image (T2I) agents can plan and execute multi-step tool chains to accomplish complex generation and editing queries. However, this capability introduces a new safety attack surface: harmful outputs may arise from tool orchestration, where individually benign steps combine into unsafe results, making prompt-only jailbreak techniques insufficient. We present OrchJail, an orchestration-guided fuzzing framework for jailbreaking tool-calling T2I agents. Its core idea is to exploit high-risk tool-orchestration patterns: by learning from successful jailbreak tool-calling traces and their causal relationships to prompt wording, OrchJail directly guides the fuzzing search toward prompts that are more likely to trigger unsafe multi-step tool behaviors, rather than relying on surface-level textual perturbations. Extensive experiments demonstrate that OrchJail improves jailbreak effectiveness and efficiency across representative toolcalling T2I agents, achieving higher attack success rates, better image fidelity, and lower query costs, while remaining robust against common jailbreak defenses. Our work highlights tool orchestration as a critical, previously unexplored attack surface and provides a novel framework for uncovering safety risks in T2I agents.

2605.07389 2026-05-11 cs.SE cs.LG

Exploring CoCo Challenges in ML Engineering Teams: Insights From the Semiconductor Industry

A. Azamnouri, M. Haug, L. Woltmann, M. Fritz, J. Bogner, S. Wagner

AI总结 本文探讨了在半导体行业中机器学习工程团队面临的协作与沟通(CoCo)挑战。研究通过访谈全球半导体公司的12位从业者,识别出16项常见的CoCo挑战,其中角色与责任不明确是最关键的问题。研究还总结了有效的实践与建议,揭示了在硬件驱动约束下,这类挑战与软件企业存在差异,为未来研究和工具开发提供了方向。

详情
英文摘要

The integration of machine learning (ML) into complex software systems has increased challenges in collaboration and communication (CoCo) of the teams building these systems. ML engineering (MLE) teams often involve diverse roles, ML engineers, data scientists, software engineers, and domain experts, each bringing unique goals, experiences, and jargon. These interdisciplinary dynamics can make it challenging to deploy, reproduce, and maintain ML-enabled systems over the long term. Previous studies have uncovered several CoCo challenges and practices, but most have focused on software-centric companies, leaving limited empirical understanding of how these dynamics unfold in hardware-centric contexts. In hardware-centric environments, CoCo challenges are shaped by additional constraints such as strict data governance, long development cycles, and tight coupling with physical processes, which amplify coordination complexity and reduce flexibility. To strengthen empirical understanding in such settings, we present a qualitative investigation of MLE teams within a global semiconductor company, where ML-enabled systems and manufacturing processes introduce additional complexity. We interviewed 12 practitioners regarding CoCo practices, tools, challenges, and approaches. Through analysis, we identified 16 recurring challenges, with unclear roles and responsibilities emerging as the most critical, and common practices and recommendations practitioners considered effective in mitigating CoCo problems. While grounded in a single organizational context, our findings align with known issues in interdisciplinary ML-enabled systems development, but also demonstrate how these challenges manifest differently under hardware-driven constraints. Our results highlight directions for future research and tool support to strengthen CoCo in MLE projects and ensure the success of ML-enabled systems.

2605.07385 2026-05-11 cs.GR cs.CV

Velocity-Space 3D Asset Editing

Hao Liu, Yuxuan Lin, Jingfeng Guo, Ruihang Chu, Junjie Wang, Ruotong Li, Yujiu Yang

AI总结 该论文提出了一种名为VS3D的3D资产编辑方法,旨在实现对3D模型局部区域的精确编辑,同时保持其余部分不变。传统方法依赖外部机制实现局部性,而VS3D则在ODE采样器内部进行针对性干预,解决了身份泄露、编辑信号放大不足以及几何和材质阶段的身份拖拽等问题。VS3D框架无需训练和掩码,通过三个互补模块分别在编辑流程的不同阶段进行优化,提升了局部编辑的准确性和保真度。

详情
英文摘要

Editing a 3D asset locally, modifying a target region while preserving the rest, is a fundamental requirement of native 3D editing. Existing methods enforce locality through mechanisms external to the generator, such as manual 3D masks, post-hoc voxel merging, or 2D multi-view lifting. None of them intervene where the corruption actually originates: inside the ODE sampler. For a rectified-flow generator to achieve faithful local editing, its velocity field should be strong over the target editing region while vanishing on preserved content. Yet a single velocity field can hardly satisfy both requirements simultaneously, leading to three problems: (i) identity leakage that keeps the edit signal non-zero on preserved regions; (ii) no dedicated edit-amplification channel, so strengthening the edit inevitably perturbs identity; and (iii) an identity drag at the geometry and material stages, where a global condition pulls every token toward the target. We propose VS3D (Velocity-Space 3D Asset editing}), an inversion-free, training-free, and mask-free framework that addresses each problem with a targeted intervention inside the sampler. VS3D integrates three complementary modules, each corresponding to a specific stage of the editing pipeline. Reconstruction-Anchored Source Injection (RASI) absorbs identity leakage by turning the unconditional embedding into a per-step, asset-specific anchor calibrated through source reconstruction. Partial-Mean Guidance (PMG) amplifies the edit signal by contrasting high- and low-quality subsample estimates of the velocity difference, active only where a consistent edit exists. Twin-Agreement Residual injection (TAR) lets the sampler decide token by token what to preserve at the geometry and material stages.

2605.07354 2026-05-11 eess.SP cs.CV

Task-Oriented Communication for Human Action Understanding via Edge-Cloud Co-Inference

Jingyi Liu, Cheng Yuan, Lijun He, Jun Zhang, Jiawei Shao

AI总结 随着智能感知技术的广泛应用,网络边缘对人类动作理解的准确性需求日益增长。为解决传统方法中视频数据传输带宽消耗大、延迟高和隐私泄露等问题,本文提出了一种基于边缘-云端协同的面向任务的通信框架TOAU。该方法通过单目姿态估计器提取视频中的关节坐标,并利用VQ-VAE将其编码为离散运动标记,仅传输少量的编码索引,极大降低了传输负载和延迟,同时保障了隐私安全。实验表明,该系统在保持动作理解精度的同时,显著提升了通信效率。

Comments 12 pages, 6 figures

详情
英文摘要

The expanding application of smart sensing has created a growing demand for the accurate understanding of human action at the network edge. Traditional approaches require massive video data to be transmitted from resource-constrained edge devices to powerful cloud servers, incurring prohibitive uplink bandwidth consumption and unacceptable latency while raising privacy concerns. To overcome these bottlenecks, we propose a task-oriented communication framework for human action understanding (TOAU) through edge-cloud collaboration. Our framework utilizes a monocular pose estimator to extract continuous joint coordinates from raw videos, followed by a vector quantized variational autoencoder (VQ-VAE) to convert these coordinates into discrete motion tokens. Consequently, only a compact sequence of codebook indices is transmitted over the network, consuming as few as 9 bits per frame and avoiding privacy leakages. At the cloud server, a lightweight projector aligns these motion tokens with the embedding space of a large vision-language model (VLM) to facilitate complex action understanding, which is trained with an efficient instruction tuning paradigm. Comprehensive evaluations on three benchmarks demonstrate that our TOAU system reduces the transmission payload to approximately 1\% and the system latency to around 20\% compared to video codec-based solutions, while delivering comparable action understanding accuracy.

2605.07314 2026-05-11 cs.IR cs.AI

DCGL: Dual-Channel Graph Learning with Large Language Models for Knowledge-Aware Recommendation

Xinchi Zou, Tongzhenzhi Su, Jianjun Li, Yuan Fu, Chang Liu, Zhiying Deng, Zhiwei Shen

AI总结 本文提出了一种名为DCGL的双通道图学习框架,旨在提升知识感知推荐系统的性能。该方法通过解耦语义信息与用户行为模式,结合多级对比学习和动态融合机制,有效解决了知识图谱与大语言模型融合中的语义关系建模不足、嵌入融合干扰以及用户-物品交互频率差异等问题。实验表明,DCGL在多个真实数据集上优于现有方法,尤其在稀疏场景下表现出显著优势。

Comments Accepted by SIGIR 2026

详情
英文摘要

Knowledge Graphs (KGs) have proven highly effective for recommendation systems by capturing latent item relationships, while recent integration of Large Language Models (LLMs) has further enhanced semantic understanding and addressed knowledge sparsity issues. Nevertheless, current KG-and-LLM-based methods still face three main limitations: 1) inadequate modeling of implicit semantic relationships beyond explicit KG links; 2) suboptimal single-channel fusion of ID and LLM embeddings, which often leads to signal interference and blurred representations; and 3) insufficient consideration of user-item interaction frequency variations in recommendation strategies. To address these challenges, we propose the Dual-Channel Graph Learning (DCGL) framework, featuring three key innovations: 1) a dual-channel architecture that structurally decouples rich semantic information from user behavioral patterns, preventing early interference; 2) a multi-level contrastive learning mechanism that enhances robustness against KG noise through intra-view contrasts and bridges semantic gaps between channels via inter-view alignment; and 3) a dynamic fusion mechanism that adaptively balances semantic generalization and behavioral specificity based on interaction frequency, resolving the cascading limitation. Extensive experiments on four real-world datasets show that DCGL consistently outperforms state-of-the-art methods, yielding substantial improvements in sparse scenarios while maintaining precision for active users. Our code is available at https://github.com/XinchiZou/DCGL.