arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 21512
2606.04592 2026-06-04 cs.CY cs.AI cs.HC

Synthetic Personalities: How Well Can LLMs Mimic Individual Respondents Using Socio-Economic Microdata?

合成人格:LLM 如何使用社会经济微观数据模仿个体受访者?

Leonard Kinzinger, Jochen Hartmann

发表机构 * Technical University of Munich(慕尼黑技术大学)

AI总结 研究利用德国社会经济面板数据构建个体级数字孪生,通过评估不同构建方法(模型、信息深度、嵌入方式、推理模式)对200万以上孪生响应的准确性,发现信息深度在75%熵分位数达到成本效益帕累托点,最佳单元准确率达78.8%。

详情
AI中文摘要

基于LLM的数字孪生有望扩展和加速市场研究,但大多数已发表的孪生要么是基于少数人口统计问题的粗略角色机器人,要么是基于专门收集的调查和访谈记录构建的详细个体级孪生。这两种设置都不涉及营销实践中操作上最相关的情况:从企业通过CRM系统、忠诚度计划和重复调查积累的现有异构面板数据中构建详细的个体孪生。我们从德国社会经济面板(SOEP)构建详细的个体级孪生,并在一个$3 \times 5 \times 2 \times 2$的构建方法网格中评估它们,该网格涵盖三个开放权重的LLM、五个按归一化香农熵排序的累积信息深度、两种嵌入方法和两种推理模式,对500名参与者和183个保留问题评分超过210万个孪生响应。孪生质量随信息深度提高,但超过75%熵分位数后收益递减,该分位数相对于性能最佳的100%单元充当成本效益帕累托点。将嵌入从叙述性角色摘要切换到原始对话历史(过去响应)在100%深度下每个模型-推理单元中提高了保留准确率,而显式思考模式提高了秩次相关性但不改变准确率。最佳单元准确率达到78.8%,Fisher-$z$相关性在SOEP保留评估集上达到$r = 0.590$。研究结果表明,基于孪生的市场研究不再受数据设计限制,而是受项目数量、模型选择和本文现在映射的一小部分构建级决策限制。

英文摘要

LLM-based digital twins promise to scale and accelerate market research, but most published twins are either coarse persona bots conditioned on a few demographic questions or detailed individual-level twins built on purpose-collected surveys and interview transcripts. Neither setup speaks to the operationally most relevant case for marketing practice: building detailed individual twins from the pre-existing heterogeneous panel data that firms already accumulate through CRM systems, loyalty programs, and repeat surveys. We construct detailed individual-level twins from the German Socio-Economic Panel (SOEP) and evaluate them across a $3 \times 5 \times 2 \times 2$ construction-method grid that covers three open-weights LLMs, five cumulative information depths ranked by normalized Shannon entropy, two embedding methods, and two reasoning modes, scoring over 2.1 million twin responses on 500 participants and 183 held-out questions. Twin quality rises with information depth but with diminishing returns past the 75 percent entropy quartile, which acts as a cost-efficient Pareto point relative to the best-performing 100 percent cells. Switching the embedding from a narrative persona summary to a raw dialog history of past responses raises hold-out accuracy in every model-by-reasoning cell at the 100 percent depth, while an explicit thinking mode raises rank-order correlation without moving accuracy. Best-cell accuracy reaches 78.8 percent and Fisher-$z$ correlation reaches $r = 0.590$ on the SOEP held-out evaluation set. The findings suggest that twin-based market research is no longer gated by data design, but by item volume, model selection, and a small set of construction-level decisions that this paper now maps.

2606.04582 2026-06-04 physics.comp-ph cs.LG physics.app-ph

Reconstructing Unobservable Temperature Fields via Simulation-Aided Intelligent Sensing

通过仿真辅助智能感知重建不可观测温度场

Monika Stipsitz, Hèlios Sanchis-Alepuz, Jacob Reynvaan, Silvester Sabathiel

发表机构 * Silicon Austria Labs(硅酸奥地利实验室) Republic of Austria(奥地利共和国) Styrian Business Promotion Agency(施蒂里亚商业促进局) federal state of Carinthia(卡林西亚联邦州) Upper Austrian Research(上奥地利研究) Austrian Association for the Electric and Electronics Industry(奥地利电子电气工业协会)

AI总结 提出基于随机物理仿真生成数据集的方法,训练神经网络从稀疏传感器重建内部温度场,实现实时在线监测。

Comments Presented at IEEE International Instrumentation and Measurement Technology Conference (I2MTC), Nancy, France, 2026

详情
AI中文摘要

在许多系统中,由于传感器位置的限制,实时监测组件和子结构内部的温度分布是一个具有挑战性的课题。虽然机器学习在许多应用中是一种多功能工具,但其在高分辨率热监测中的应用受到高质量训练数据集可用性的阻碍。在这项工作中,我们提出了一种基于随机物理仿真为工业应用生成数据集的新方法。我们在一个概念验证硬件设置中演示了该方法:仅在此类合成数据集上训练的神经网络被用于从嵌入硬件中的稀疏传感器重建内部温度场。基于神经网络的重建不仅在鲁棒性上优于克里金法,而且能够实现实时推理,使得该方法适用于在线监测原本不可观测的热状态。

英文摘要

Real-time monitoring of the temperature distribution within components and sub-structures is a challenging topic in many systems due to restrictions on feasible sensor locations. While machine learning (ML) proves a versatile tool in many applications, its adoption for high-resolution thermal monitoring is hindered by the availability of high-quality datasets for training. In this work, we propose a novel approach for generating datasets for industrial applications based on randomized physics-based simulations. We demonstrate the approach in a proof-of-concept hardware setup: A neural network (NN) trained only on such a synthetic dataset, is used to reconstruct the internal temperature field from sparse sensors embedded in the hardware. The NN-based reconstructions do not only outperform Kriging in robustness but also enable real-time inference, making the method suitable for online monitoring of otherwise unobservable thermal states.

2606.04576 2026-06-04 stat.ML cs.LG econ.EM q-fin.RM

ReSGA: A Large Tail Risk Model for Learning Value-at-Risk and Expected Shortfall

ReSGA: 一种用于学习风险价值和预期缺口的大尾部风险模型

Yichi Zhang, Ke Zhu, Zhoufan Zhu

发表机构 * Hong Kong University(香港大学) Xiamen University(厦门大学)

AI总结 提出检索增强自分组自编码器(ReSGA),利用数百万参数捕捉资产横截面依赖和长期时间动态,在1926-2023年美国股票数据上优于12种基准模型,并通过新规模增强左尾动量策略实现经济收益。

详情
AI中文摘要

学习风险价值(VaR)和预期缺口(ES)对于有效管理金融风险至关重要。在大数据时代,参数有限的现有方法容易受到模型错误设定的影响。为了解决这一局限性,我们提出了一种大尾部风险模型——检索增强自分组自编码器(ReSGA),该模型设计有数百万个参数,利用资产的特征来挖掘丰富的横截面依赖性和长期时间动态。应用于1926年至2023年的月度美国股票收益数据,包含153个公司特征,ReSGA在样本外损失和统计回测方面优于十二种计量经济学和机器学习竞争对手。此外,其预测优势可以通过一种新的规模增强左尾动量策略构建的多空十分位投资组合转化为显著的经济收益。为了阐明复杂性的作用,我们进一步进行了系统的规模分析,并证明联合VaR-ES预测的改进主要由数据复杂性驱动,而非模型复杂性。最后,我们的组重要性和迁移学习分析展示了ReSGA的可解释性和跨市场泛化能力。

英文摘要

Learning Value-at-Risk (VaR) and Expected Shortfall (ES) is important for managing financial risks effectively. Existing approaches with limited parameters are vulnerable to model misspecification in the era of big data. To address this limitation, we propose a large tail risk model, the retrieval-enhanced self-grouping autoencoder (ReSGA), which is designed with millions of parameters to exploit the rich cross-sectional dependence and long-term temporal dynamics of assets using their characteristics. Applied to monthly US equity returns from 1926 to 2023 with 153 firm characteristics, ReSGA outperforms twelve econometric and machine learning competitors in terms of out-of-sample loss and statistical backtesting. In addition, its forecast advantages can translate into significant economic gains from long-short decile portfolios that are constructed by a new size-enhanced left-side momentum strategy. To clarify the role of complexity, we further conduct a systematic scaling analysis and demonstrate that improvements in joint VaR-ES forecasting are primarily driven by data complexity rather than model complexity. Finally, our analyses of group-importance and transfer-learning exhibit the interpretability and cross-market generalizability of ReSGA.

2606.04527 2026-06-04 cs.MM cs.CV cs.GR

Echo-Infinity: Learning Evolving Memory for Real-Time Infinite Video Generation

Echo-Infinity: 学习演化记忆用于实时无限视频生成

Yuxuan Bian, Zeyue Xue, Songchun Zhang, Shiyi Zhang, Weiyang Jin, Yaowei Li, Junhao Zhuang, Haoran Li, Jie Huang, Haoyang Huang, Nan Duan, Qiang Xu

发表机构 * The Chinese University of Hong Kong(香港中文大学) Joy Future Academy, JD(joy future academy) The Hong Kong University of Science and Technology(香港科学与技术大学) Tsinghua University(清华大学) The University of Hong Kong(香港大学) University of Science and Technology of China(中国科学技术大学)

AI总结 提出Echo-Infinity框架,通过可学习的演化记忆以恒定成本动态过滤、抽象和压缩任意长度历史,结合统一相对RoPE方案,首次实现24小时实时无限视频生成。

Comments Website: https://echo-team-joy-future-academy-jd.github.io/Echo-Infinity/

详情
AI中文摘要

我们提出了Echo Infinity,一个面向实时无限视频生成的自回归(AR)框架,它采用可学习的演化记忆,以恒定成本动态过滤、抽象和压缩任意长度的历史。现有方法主要使用预定义的KV缓存调度、固定比例启发式压缩或推理时的RoPE适配来管理记忆。这些设计由于有限的缓存窗口和忽略自回归生成噪声,不可避免地丢失历史信息并放大复合误差。受人类记忆巩固的启发,Echo-Infinity用可学习的记忆查询替代手工设计的记忆管理,这些查询通过注意力和门控机制在过去的帧从局部窗口中被驱逐时更新。查询与视频扩散变换器(DiTs)进行端到端优化,形成一种演化记忆,支持任意压缩比,计算量恒定且与视频长度无关。它们还充当可泛化的生成先验,即使仅使用优化后的初始状态也能提高质量。我们进一步引入了统一相对RoPE方案,它将锚定帧固定从id 0开始,并让最新帧的id在训练和推理过程中最多增长到DiTs预训练的最大时间RoPE id,从而将模型从有限的RoPE约束中解放出来,并缩小训练-测试的RoPE外推差距。在长视频和短视频生成中,Echo-Infinity达到了最先进的性能,并且据我们所知,首次展示了有前景的24小时(>130万帧)实时滚动生成,为无限视频生成提供了一条实用路径。

英文摘要

We present Echo Infinity, an autoregressive (AR) framework towards real-time infinite video generation that employs a learnable evolving memory to dynamically filter, abstract, and compress any-length history at constant cost. Existing methods mainly curate memory with predefined KV-cache schedules, fixed-ratio heuristic compression, or inference-time RoPE adaptation. These designs inevitably lose historical information and amplify compounding errors due to their limited cache window and ignorance of autoregressive generation noise. Inspired by human memory consolidation, Echo-Infinity replaces handcrafted memory curation with learnable Memory Query, which are updated by attention and a gating mechanism when past frames are evicted from the local window. The queries are optimized end-to-end with the video diffusion transformers (DiTs), forming an evolving memory that supports arbitrary compression ratios with constant computation independent of video length. They also act as a generalizable generation prior, improving quality even when only the optimized initial state is used. We further introduce Unified Relative RoPE Recipe, which anchors the sink frames to start from id 0 and lets the newest frame id grow at most to the DiTs' pretrained maximum temporal RoPE id throughout training and inference, freeing the model from the finite RoPE constraint and closing the train-test RoPE extrapolation gap. In long and short video generation, Echo-Infinity achieves state-of-the-art performance, and, to our knowledge, demonstrates promising 24-hour (>1.3 M frames) real-time rollouts for the first time, suggesting a practical path toward infinite video generation.

2606.04522 2026-06-04 cs.IR cs.AI cs.DB cs.LG

ANN Search: Recall What Matters

ANN搜索:召回真正重要的

Dimitris Dimitropoulos, Nikos Mamoulis

发表机构 * University of Ioannina(伊奥尼亚大学) Archimedes, Athena RC(阿基米德,雅典RC)

AI总结 本文提出用逆近似比1/Ratio@k替代Recall@k来评估近似最近邻搜索质量,实验表明前者能更准确反映实际效用并降低计算开销。

详情
AI中文摘要

近似最近邻(ANN)搜索已成为信息检索和现代机器学习任务(从分类到检索增强生成)的核心原语。社区主要通过给定Recall@k(检索到的真实精确最近邻的比例)下的吞吐量来评估和调优ANN算法。我们认为,ANN搜索真正重要的是检索结果的质量,而非它们与真实kNN集合的重叠。我们证明,使用Recall@k评估检索质量会带来不必要的计算开销,并研究用逆近似比1/Ratio@k替代它。1/Ratio@k评估检索到的邻居与真实邻居之间距离的差异。它无需判断、无需超参数,仅通过标准ANN基准输入即可计算。我们在涵盖广泛内在维度的多样化数据集上对最先进的ANN算法进行基准测试,从效率、下游分类和检索增强生成三个维度全面评估这两个指标。在效率方面,优化1/Ratio@k达到操作质量阈值所需的计算成本远低于Recall@k。在下游任务中,即使Recall@k显著下降,性能指标(标签精度、语义相似度、BERTScore和LLM评分质量)仍保持高度稳定。相反,逆近似比紧密反映了这种稳定性,比Recall@k更好地追踪实际效用。最终,虽然Recall@k夸大了近似的真实成本,但1/Ratio@k提供了更准确、可部署的ANN实际质量代理。

英文摘要

Approximate nearest neighbor (ANN) search has become a core primitive in information retrieval and modern machine learning tasks, from classification to retrieval-augmented generation. The community evaluates and tunes ANN algorithms primarily on their throughput at a given Recall@k, the fraction of true exact neighbors retrieved. We argue that what really matters in ANN search is the quality of the retrieved results and not their overlap with the true kNN set. We show that using Recall@k to assess retrieval quality forces unnecessary computational overhead and investigate replacing it by 1/Ratio@k, the inverse approximation ratio. 1/Ratio@k evaluates the differences between the distances of the retrieved and true neighbors. It is judge-free, hyperparameter-free, and computable from standard ANN benchmark inputs alone. We benchmark state-of-the-art ANN algorithms across diverse datasets spanning a wide range of intrinsic dimensionalities, evaluating the two metrics comprehensively across efficiency, downstream classification, and retrieval-augmented generation. On the efficiency axis, optimizing for 1/Ratio@k reaches operational quality thresholds at a substantially lower computational cost than Recall@k. In downstream tasks, performance indicators (label precision, semantic similarity, BERTScore, and LLM-graded quality) remain highly stable even when Recall@k drops significantly. The inverse approximation ratio, on the other hand, closely mirrors this stability, tracking true utility much better than Recall@k. Ultimately, while Recall@k overstates the true cost of approximation, 1/Ratio@k offers a more accurate, deployable proxy for actual ANN quality.

2606.04517 2026-06-04 cs.NI cs.AI

Treat Traffic Like Trees: A Semantic-Preserving Hierarchical Graph-Based Expert Framework for Encrypted Traffic Analysis

像对待树一样对待流量:一种用于加密流量分析的语义保持分层图专家框架

Yuantu Luo, Jun Tao, Linxiao Yu, Guang Cheng

发表机构 * School of Cyber Science and Engineering, Southeast University(东南大学网络安全科学与工程学院) Purple Mountain Laboratories(紫金山实验室) Engineering Research Center of Blockchain Application, Supervision and Management (Southeast University)(区块链应用、监督与管理工程研究中心(东南大学)) Engineering Research Center of Security for Ubiquitous Network, Jiangsu Province(江苏省物联网安全工程技术研究中心)

AI总结 提出一种基于协议树图注意力与专家混合的语义保持分层图专家框架(PTGAMoE),通过字段级图构建和专家委员会设计,在严格无数据泄露设置下显著优于现有模型,并提供可解释的协议级特征重要性分析。

Comments This work has been submitted to the IEEE for possible publication

详情
AI中文摘要

基于图的深度学习方法已被广泛应用于加密流量分析,以利用不同粒度下的潜在相关性。然而,复杂的预处理流程和精细的模型结构虽然通常能取得良好性能,但在表示学习过程中可能掩盖固有的协议语义。此外,由协议规范定义并在人工流量分析中常规使用的协议层及其对应字段的分层结构,在现有学习框架中仍未得到充分探索。在本文中,我们提出了一种用于加密流量分析的语义保持分层图专家框架——协议树图注意力与专家混合(PTGAMoE)。基于字段的图构建和专家委员会设计使PTGAMoE能够量化模型对特定字段和协议的偏好。在严格无数据泄露设置下,对代表性基准数据集的大量实验结果表明,PTGAMoE显著优于最先进的模型。此外,语义保持设计提供了关于协议级特征重要性和专家级贡献的可解释性洞察,反映了模型在加密流量分类任务中的决策逻辑。

英文摘要

Graph-based deep learning methods have been widely employed in encrypted traffic analysis to exploit latent correlations across different granularities. However, while complex preprocessing pipelines and sophisticated model structures often achieve strong performance, they may obscure inherent protocol semantics during representation learning. Moreover, the hierarchical structure of protocol layers and their corresponding fields, defined by protocol specifications and routinely utilized in manual traffic analysis, remains underexplored in existing learning frameworks. In this paper, we propose Protocol Tree Graph Attention with Mixture of Experts (PTGAMoE), a semantic-preserving hierarchical graph-based expert framework for encrypted traffic analysis. The field-based graph construction and expert committee design enable PTGAMoE to quantify the model's preferences for specific fields and protocols. Extensive experimental results on representative benchmark datasets under strict no-data-leakage settings demonstrate that PTGAMoE significantly outperforms state-of-the-art (SOTA) models. Furthermore, the semantic-preserving design provides interpretable insights into protocol-level feature importance and expert-level contributions, reflecting the model's decision-making logic in encrypted traffic classification tasks.

2606.04499 2026-06-04 cs.SI cs.LG

Modeling and Interpreting Teamwork Dynamics in Cancer Care Outcome Prediction

建模与解释癌症护理结果预测中的团队协作动态

Yuhua Huang, Hsiao-Ying Lu, Kwan-Liu Ma

发表机构 * University of California, Davis(加州大学戴维斯分校)

AI总结 利用电子健康记录中的协作网络和机器学习方法,研究医疗专业人员团队协作动态对癌症患者生存预测的影响,并解释关键网络特征。

详情
AI中文摘要

癌症护理需要纵向方法,根据每个患者的需求随时间规划和实施治疗。虽然先前研究深入探讨了临床和人口统计学因素(如合并症和年龄)如何指导治疗规划,但对护理实施阶段的关注却少得多。然而,规划和实施都是基于团队的过程,依赖于多个医疗专业人员之间的协调努力。因此,这些协作实践中蕴含的人为因素对于优化患者结果至关重要。尽管重要性显著,但现有关于癌症护理中人为因素的文献有限,很少有研究调查护理团队内的协作如何在治疗过程中演变。为填补这一空白,本研究探讨通过电子健康记录系统捕获的医疗专业人员协作如何影响癌症患者结果,特别强调团队协作动态。我们将电子健康记录介导的医疗专业人员交互表示为网络,并应用机器学习方法识别这些协作结构中嵌入的患者生存预测信号。我们进一步通过指出与特定结果相关的网络特征和动态模式来解释模型预测。我们通过稳健性分析评估模型,确保发现稳定且不受训练中随机变异驱动。此外,我们的见解与医学文献中提出的假设一致,我们的结果为这些主张提供了基于经验数据的证据。总体而言,我们的工作提供了一个实用流程,利用协作的数字痕迹来评估和加强纵向团队医疗,为医疗实施中的数据驱动干预提供可操作的见解。

英文摘要

Cancer care requires a longitudinal approach in which treatments are planned and delivered over time according to the needs of each individual patient. While prior research has thoroughly explored how clinical and demographic factors, such as comorbidities and age, inform treatment planning, far less attention has been devoted to the delivery phase of care. Yet planning and delivery are both team-based processes that depend on coordinated efforts among multiple healthcare professionals (HCPs). As such, the human factors embedded in these collaborative practices are crucial to optimizing patient outcomes. Despite this importance, the existing literature on human factors in cancer care is limited, and very few studies have investigated how collaboration within care teams evolves over the course of treatment. To fill this gap, this work examine how HCPs' collaboration, captured through electronic health record (EHR) systems, affects cancer patient outcomes, with particular emphasis on teamwork dynamics. We represent EHR-mediated HCP interactions as networks and apply machine learning methods to identify predictive signals of patient survival embedded in these collaborative structures. We further interpret model predictions by pinpointing network characteristics and dynamic patterns associated with particular outcomes. We evaluate our model through robustness analyses to ensure that the findings are stable and not driven by stochastic variation in training. Additionally, our insights align with hypotheses proposed in the medical literature, and our results provide the empirical, data-driven evidence supporting these claims. Overall, our work contributes a practical workflow for leveraging digital traces of collaboration to evaluate and strengthen longitudinal team-based healthcare, offering actionable insights to guide data-informed interventions in healthcare delivery.

2606.04486 2026-06-04 cs.CR cs.CL cs.LG stat.ML

Global Sketch-Based Watermarking for Diffusion Language Models

基于全局草图的扩散语言模型水印

Daniel Zhao

发表机构 * Harvard University(哈佛大学)

AI总结 提出一种针对掩码扩散语言模型的全局向量草图水印方法,通过控制文本的整体统计特征实现与局部上下文无关的检测。

详情
AI中文摘要

语言模型的水印方法在自回归设置中已被广泛研究,其中令牌是顺序生成的。这些工作主要关注局部上下文方案,该方案根据前序令牌扰动下一个令牌的分布。在扩散语言模型中,许多未解析位置的分布被联合采样,使得整个序列的加性统计在生成过程中是可处理的。我们提出了一种针对掩码扩散语言模型的水印,该水印控制文本的全局向量草图表示。与上下文相关的水印相比,草图公式将检测与生成过程中看到的局部上下文解耦,从而产生一个顺序无关的统计量和一个不表现为简单令牌偏差的水印规则。我们分析了该方法的失真、合理性和鲁棒性。

英文摘要

Watermarking methods for language models have been studied extensively in the autoregressive setting, where tokens are generated sequentially. These works largely focus on local-context schemes that perturb the next token's distribution as a function of its preceding tokens. In diffusion language models, distributions over many unresolved positions are jointly sampled, allowing additive statistics of the entire sequence to be tractable during generation. We propose a watermark for masked diffusion language models that controls a global, vector-valued sketch representation of the text. Compared to context-dependent watermarking, the sketch formulation decouples detection from the local contexts seen during generation, resulting in an order-agnostic statistic and a watermarking rule which does not manifest as a simple token bias. We analyze the distortion, soundness, and robustness properties of the method.

2606.04460 2026-06-04 cs.CR cs.AI cs.LG

CyberGym-E2E: Scalable Real-World Benchmark for AI Agents' End-to-End Cybersecurity Capabilities

CyberGym-E2E:面向AI代理端到端网络安全能力的可扩展真实世界基准

Tianneng Shi, Robin Rheem, Dongwei Jiang, Mona Wang, Francisco De La Riega, Zhun Wang, Jingzhi Jiang, Alexander Cheung, Sean Tai, Jonah Cha, Jianhong Tu, Gabriel Han, Chenguang Wang, Jingxuan He, Wenbo Guo, Dawn Song

发表机构 * Stanford University(斯坦福大学) UC Berkeley(加州大学伯克利分校)

AI总结 提出CyberGym-E2E,一个大规模、真实的端到端网络安全基准,通过自动化流水线将开源漏洞数据转化为评估环境,全面评估AI代理在漏洞发现、PoC生成和补丁生成全生命周期中的能力。

Comments ICML 2026

详情
AI中文摘要

人工智能有潜力通过使系统能够自主检测、分析和修复软件漏洞来改变网络安全。然而,现有对AI系统的网络安全评估在规模或范围上有限,未能捕捉真实世界软件漏洞发现和修复的端到端生命周期。为了解决这一差距,我们提出了CyberGym-E2E,一个大规模、真实的端到端网络安全基准,全面评估AI代理在漏洞发现、PoC生成和补丁生成整个生命周期中的能力。CyberGym-E2E全面且可扩展,因为我们构建了一个自动化的、代理增强的流水线,用于将开源漏洞数据转化为真实的评估环境。目前,该基准包含139个不同开源项目中的920个真实世界漏洞。

英文摘要

AI has the potential to transform cybersecurity by enabling systems that can autonomously detect, analyze, and remediate software vulnerabilities. However, existing cybersecurity evaluations of AI systems are limited in scale or scope, and fail to capture the end-to-end lifecycle of real-world software vulnerability discovery and remediation. To address this gap, we propose CyberGym-E2E, a large-scale and realistic end-to-end cybersecurity benchmark that comprehensively evaluates AI agents' abilities across the full lifecycle of vulnerability discovery, PoC generation, and patch generation. CyberGym-E2E is comprehensive and scalable, as we build an automated, agent-enhanced pipeline for transforming open-source vulnerability data into realistic evaluation environments. Currently, the benchmark consists of 920 real-world vulnerabilities across 139 different open-source projects.

2606.04459 2026-06-04 cs.CR cs.AI cs.CC cs.CL

Token Rankings are Unforgeable Language Model Signatures

Token排名是不可伪造的语言模型签名

Matthew Finlayson, Andreas Grivas, Xiang Ren, Swabha Swayamdipta

发表机构 * University of Southern California(南加州大学) University of Edinburgh(爱丁堡大学)

AI总结 本文发现语言模型的token排名(按概率排序)构成唯一且不可伪造的签名,并研究了在限制API下如何平衡签名展示与参数泄露。

详情
AI中文摘要

已知语言模型参数对其logit输出施加了(每个模型)独特的几何约束,这作为识别模型的签名,但当API分发logits时也会泄露模型的最后一层参数。我们研究了更严格的API,这些API只暴露token排名(即按概率排序,但不暴露概率值),并发现排名也构成签名:对于足够大的$k$,每个模型都有一组唯一的可行top-$k$排名。此外,排名签名是第一个已知的(多项式时间)不可伪造签名,因为找到一个具有相同可行排名集的模型是NP难的。在安全方面,我们发现token排名已经足以近似窃取模型的最后一层,类似于logits,尽管近似太粗糙以至于无法伪造签名,并且可以通过将API限制为足够小的$k$的top-$k$ token来有效应对。由于展示模型签名所需的top-$k$通常小于防止窃取所需的$k$,因此API可以在不泄露模型参数的情况下展示不可伪造的签名。

英文摘要

Language model parameters are known to impose unique (to each model) geometric constraints on their logit outputs, which serves as a signature that identifies the model, but also leaks the model's final layer parameters when an API distributes logits. We investigate more restrictive APIs that expose token rankings (i.e., their ordering by probability, but not the probability values) and find that rankings also constitute a signature: every model has a unique set of feasible top-$k$ rankings for sufficiently large $k$. Furthermore, the ranking signature is the first known (polynomially) unforgeable signature, since finding a model with the same set of feasible rankings is NP-hard. On the security front, we find that token rankings are already sufficient to approximately steal the final layer of the model, similar to logits, though the approximation is too coarse to forge the signature, and can be effectively countered by restricting the API to top-$k$ tokens with sufficiently small $k$. Since the top-$k$ required to present the model signature is generally smaller than the $k$ required to prevent stealing, it is possible for an API to present an unforgeable signature without leaking model parameters.

2606.04446 2026-06-04 cs.DC cs.LG

D^2SD: Accelerating Speculative Decoding with Dual Diffusion Draft Models

D^2SD: 使用双重扩散草稿模型加速推测解码

Liyuan Zhang, Jiarui Zhang, Jinwei Yao, Ran Yan, Yuchen Yang, Jiahao Zhang, Tongkai Yang, Yi Wu, Binhang Yuan

发表机构 * Peking University(北京大学) Tsinghua University(清华大学) HKUST(香港科技大学) UIUC(伊利诺伊大学厄巴纳-香槟分校) Ant Group(蚂蚁集团)

AI总结 提出D^2SD框架,通过双重扩散草稿模型和置信度引导的前缀树,提升推测解码的接受率,优于现有扩散方法和自回归推测解码基线。

详情
AI中文摘要

推测解码通过草拟多个令牌并在单次目标模型前向传递中验证它们,加速自回归大语言模型推理。最近的基于扩散的草稿模型并行生成整个令牌块,但通常每次验证只提交单个草稿序列:一旦出现第一个不匹配,所有后续草稿令牌被丢弃,导致接受率有限。简单地对更多草稿候选序列进行批处理只会带来边际改进,因为冗余或位置不当的分支增加了草拟和验证的成本,而没有成比例地增加接受的令牌数量。我们提出D^2SD,一种双重扩散草稿推测解码框架,将候选组织成置信度引导的前缀树,其中第一个扩散草稿器生成一个块以及每个位置的置信度分数,用于识别最可能的拒绝边界并选择前K个前缀范围进行恢复;第二个可变前缀扩散草稿器在每个选定前缀处重新锚定,并在一次批处理中提出替代延续;得到的共享前缀候选通过级联注意力联合验证。实验表明,D^2SD在底层扩散方法和强自回归推测解码基线上均有明显改进。

英文摘要

Speculative decoding accelerates autoregressive large language model inference by drafting multiple tokens and verifying them in a single target-model forward pass. Recent diffusion-based drafters generate an entire block of tokens in parallel but usually commit to a single draft sequence per verification: once the first mismatch occurs, all subsequent draft tokens are discarded, resulting in a limited acceptance rate. Naively batching more draft candidate sequences only introduces a marginal improvement, as redundant or poorly placed branches increase the cost of drafting and verification without proportionally increasing the number of accepted tokens. We propose D^2SD, a dual diffusion draft speculative decoding framework that organizes candidates into a confidence-guided prefix tree, where the first diffusion drafter generates a block along with per-position confidence scores that are used to identify the most likely rejection boundary and select the top-K prefix ranges for recovery; the second variable-prefix diffusion drafter re-anchors at each selected prefix and proposes alternative continuations in one batched pass; the resulting shared-prefix candidates are jointly verified via cascade attention. Empirically, D^2SD shows clear improvements over both the underlying diffusion approach and strong autoregressive speculative decoding baselines.

2606.04444 2026-06-04 eess.IV cs.LG

Scaling Datasets for Multi-Sensor, Multi-Agent, and Multi-Domain Learning in Autonomous Systems

面向自主系统中多传感器、多智能体与多领域学习的数据集扩展

R. Spencer Hallyburton, David Hunt, Miroslav Pajic

发表机构 * Department of Electrical and Computer Engineering, Duke University(电气与计算机工程系,杜克大学)

AI总结 提出基于AVstack和CARLA的模块化数据集生成流程,创建TB级带真实标签的多域数据,支持单/多智能体与灵活传感器配置,用于特定应用训练和协作自主研究。

详情
AI中文摘要

现有数据集无法支持多智能体、多传感器或多领域自主系统中的大规模学习,而多样性和协调性在这些系统中至关重要。我们提出了一种模块化数据集生成流程,利用AVstack框架和CARLA模拟器,为地面、空中和基础设施系统创建TB级、带有真实标签的数据。该流程支持单智能体和多智能体配置,配备灵活的传感器套件,能够在具有挑战性的条件下进行可控实验。代表性的感知与融合研究表明,生成的数据可以支持特定应用的训练和协作自主性。

英文摘要

Existing datasets cannot support large-scale learning in multi-agent, multi-sensor, or multi-domain autonomy, where diversity and coordination are essential. We present a modular dataset generation pipeline that creates terabyte-scale, ground-truth-labeled data for ground, aerial, and infrastructure-based systems using the AVstack framework and CARLA simulator. Supporting single- and multi-agent configurations with flexible sensor suites, the pipeline enables controllable experimentation across challenging conditions. Representative perception and fusion studies show how generated data can support application-specific training and collaborative autonomy.

2606.04429 2026-06-04 stat.ML cs.LG

Flatness and Generalization: Learning Multi-Index Models with Homogeneous Neural Networks

平坦性与泛化:使用齐次神经网络学习多指标模型

Harsh Vardhan, Hossein Taheri, Arya Mazumdar

发表机构 * Department of Computer Science(计算机科学系) University of California, San Diego(加州大学圣地亚哥分校) Halicioğlu Data Science Institute(Halicioğlu数据科学研究所)

AI总结 本文研究两层齐次神经网络学习多指标模型时,平坦性与泛化之间的关系,证明最平坦插值器总能泛化,而某些非泛化插值器的平坦性无法接近最平坦值。

详情
AI中文摘要

用于解释一阶梯度方法在非凸神经网络上泛化能力的常见启发式方法是“平坦插值器泛化良好”(Hochreiter and Schmidhuber, 1994; Keskar et al., 2017),其中平坦性可通过经验损失Hessian矩阵的迹来衡量。然而,Dinh等人(2017)表明,利用网络的对称性(可在保持总体和经验损失不变的情况下改变平坦性),任何插值器都可以变得更尖锐或更平坦。这一结果使得之前的启发式陈述变得空洞。在本文中,我们表明,对于使用两层非凸齐次神经网络学习未知多指标模型,尽管存在对称性,平坦性与泛化之间仍存在联系。这种联系涉及“最平坦”插值器,即所有插值器中具有阶数最小平坦性的插值器。首先,我们证明存在一类自然的非泛化插值器,其平坦性即使利用对称性也无法接近最平坦可能值。其次,我们证明,对于由单指标模型之和生成的数据,如果近似误差和标签噪声较低,任何最平坦插值器都能实现较小的总体损失,即最平坦插值器总是泛化的。这建立了平坦性与泛化之间的直接联系,适用于一大类激活函数和现实数据分布。

英文摘要

A common heuristic used to explain the generalization of first-order gradient methods on non-convex neural networks is that "flat interpolators generalize well" (Hochreiter and Schmidhuber, 1994; Keskar et al., 2017), where flatness can be measured by the trace of the Hessian of the empirical loss. However, Dinh et al. 2017) showed that, using symmetry of the network that can change flatness while keeping the population and empirical losses unchanged, any interpolator can be made sharper or flatter. This result makes the earlier heuristic statement vacuous. In this paper, we show that for learning an unknown multi-index model with $2$-layer non-convex homogeneous neural networks, there is a connection between flatness and generalization, despite the existence of symmetries. This connection pertains to the "flattest" interpolators, i.e., the interpolators that have orderwise minimum flatness among all interpolators. First, we show that there exists a natural class of non-generalizing interpolators whose flatness cannot be made closer to the flattest possible, even using symmetries. Second, we show that for data generated by a sum of single-index models, if the approximation error and label noise are low, any flattest interpolator achieves small population loss, i.e., the flattest interpolators always generalize. This establishes a direct link between flatness and generalization which applies to a large class of activations and realistic data distributions.

2606.04425 2026-06-04 cs.CR cs.AI

What If Prompt Injection Never Left? Exploring Cross-Session Stored Prompt Injection in Agentic Systems

如果提示注入从未消失?探索智能体系统中的跨会话存储提示注入

Yuanbo Xie, Tianyun Liu, Yingjie Zhang, Suchen Liu, Yulin Li, Liya Su, Tingwen Liu

发表机构 * Institute of Information Engineering, Chinese Academy of Sciences(中国科学院信息工程研究所) School of Cyber Security, University of Chinese Academy of Sciences(中国科学院大学网络空间安全学院) AI Sec Lab, Beijing Chaitin Technology Co.,Ltd(北京柴坦科技有限公司AI安全实验室)

AI总结 本研究引入跨会话存储提示注入,通过持久化状态使提示注入从单会话模型级威胁转变为长期系统级漏洞,并构建了分类法、基准测试和沙箱工具以评估风险。

Comments position paper

详情
AI中文摘要

现代智能体系统将大语言模型从会话受限的助手转变为跨会话持久化并演化共享世界状态的有状态系统,通过记忆、文件系统、工具和其他长期存在的上下文工件实现。这种转变从根本上扩展了提示注入的攻击面。然而,先前关于提示注入的工作主要关注单会话内的模型级威胁,忽视了跨会话持久系统状态如何从根本上改变智能体系统的系统级风险。受Web系统中存储型跨站脚本的启发,我们引入了跨会话存储提示注入,其中成功的注入可以持久存在于智能体系统状态中,并在原始攻击者交互结束后长时间静默影响未来执行。为了系统研究这一威胁,我们形式化了存储提示注入,并开发了关于对抗性内容如何跨会话持久化并影响智能体系统的分类法。我们进一步开发了基准测试和沙箱工具包来评估存储提示注入的风险,支持对不同模型、攻击目标和持久化渠道的攻击成功率进行定量分析。我们的发现强调,持久化将提示注入从短暂的模型级威胁转变为嵌入智能体执行状态中的长期系统级漏洞。我们希望这项工作能引起对这一新兴威胁的更广泛关注,并激励社区系统研究和缓解智能体系统中持久化带来的系统风险。

英文摘要

Modern agentic systems transform LLMs from session-bounded assistants into stateful systems that persist and evolve shared world state across sessions through memories, filesystems, tools, and other long-lived contextual artifacts. This shift fundamentally expands the attack surface of prompt injection. However, prior works on prompt injection have largely focused on model-level threats within a single session, overlooking how cross-session persistent system state fundamentally changes the system-level risk of agentic systems. Inspired by stored cross-site scripting in web systems, we introduce cross-session stored prompt injection, where a successful injection can persist within agentic system state and silently influence future executions long after the original attacker interaction has ended. To systematically study this threat, we formalize stored prompt injection and develop a taxonomy of how adversarial content persists and affects agentic systems across sessions. We further develop a benchmark and sandbox toolkit to evaluate the risks of stored prompt injection, enabling quantitative analysis of attack success across different models, attack goals, and persistence channels. Our findings highlight that persistence transforms prompt injection from an ephemeral model-level threat into a long-lived system-level vulnerability embedded within agent execution state. We hope this work draws broader attention to this emerging threat and motivates the community to systematically study and mitigate system risks arising from persistence in agentic systems.

2606.04419 2026-06-04 eess.IV cs.AI cs.CV physics.med-ph

L-TGVN: Leveraging Longitudinal Priors for Personalized Rapid MRI

L-TGVN:利用纵向先验进行个性化快速MRI

Arda Atalık, Sumit Chopra, Daniel K. Sodickson

发表机构 * NYU Center for Data Science(纽约大学数据科学中心) Center for Advanced Imaging Innovation and Research (CAI²R)(先进成像创新与研究中心) Courant Institute of Mathematical Sciences(数学科学学院) Function Health

AI总结 提出L-TGVN,一种利用纵向先验作为侧信息从高度欠采样测量中重建当前扫描的变分网络,无需显式配准并适应协议差异,在定量指标和结构保持上优于基线方法。

Comments Accepted to MICCAI 2026

详情
AI中文摘要

MRI提供优异的软组织对比度且无电离辐射,但长采集时间增加患者不适,同时提高检查成本并限制扫描仪吞吐量。减少扫描时间的常见方法是采集更少的测量值,这会产生一个病态线性逆问题;因此,恢复诊断质量的图像需要结合测量数据之外的先验知识。在随访检查中,患者最近的先前扫描可以提供高度信息化的受试者特定背景,但实际应用因时间变化(包括病理进展)、扫描间错位以及跨采集协议漂移而复杂化。在这项工作中,我们引入了L-TGVN,一种纵向信任引导变分网络,利用先前扫描作为侧信息,从高度欠采样测量中重建当前扫描。关键是,L-TGVN约束先前扫描的影响与获取的测量一致。与许多现有的纵向重建方法不同,它不需要先前扫描和当前扫描之间的显式预配准。它进一步适应不同就诊间的采集协议差异(例如,序列参数的变化)。我们在匹配容量的基线上评估L-TGVN,包括先验引导方法和不使用纵向先验的方法,并观察到标准定量指标的一致改进,以及在挑战性加速下更好地保留精细结构。源代码可在github.com/sodicksonlab/L-TGVN获取。

英文摘要

MRI provides excellent soft-tissue contrast without ionizing radiation, but long acquisition times increase patient discomfort while also raising exam costs and limiting scanner throughput. A common approach to reduce scan time is to acquire fewer measurements, which yields an ill-posed linear inverse problem; recovering diagnostic-quality images therefore requires incorporating prior knowledge beyond the measured data. In follow-up exams, the most recent prior scan of a patient can provide a highly informative subject-specific context, but practical use is complicated by temporal changes (including pathology progression), misalignment between scans, and protocol drift across acquisitions. In this work, we introduce L-TGVN, a Longitudinal Trust-Guided Variational Network that leverages prior scans as side information to reconstruct the current scan from heavily undersampled measurements. Crucially, L-TGVN constrains the influence of prior scans to be consistent with the acquired measurements. Unlike many existing longitudinal reconstruction methods, it does not require explicit pre-registration between prior and current scans. It further accommodates differences in acquisition protocols across visits (e.g., changes in sequence parameters). We evaluate L-TGVN against matched-capacity baselines, including prior-guided methods and methods that do not use longitudinal priors, and observe consistent improvements in standard quantitative metrics together with better preservation of fine structures at challenging accelerations. Source code is available at github.com/sodicksonlab/L-TGVN.

2606.04388 2026-06-04 cs.CR cs.AI cs.LG

TITAN-FedAnil+: Trust-Based Adaptive Blockchain Federated Learning for Resource-Constrained Intelligent Enterprises

TITAN-FedAnil+:面向资源受限智能企业的基于信任的自适应区块链联邦学习

Muhammad Hadi, Muhammad Jahangir, Talha Shafique, Muhammad Khuram Shahzad

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出TITAN-FedAnil+框架,通过基于亲和传播的自适应聚类聚合过滤恶意更新、GPU加速向量化提升效率及有符号状态跳变机制实现轻量级区块链重同步,在资源受限边缘设备上内存开销降低81%。

Comments 8 pages, 5 figures; code available at https://github.com/error8149/FedAnilPlus-Optimized

详情
AI中文摘要

联邦学习(FL)已成为一种在保护数据隐私的同时实现协作智能的有效范式。然而,由非独立同分布(non-IID)数据分布引起的数据异构性和去中心化安全威胁仍然是重大挑战,尤其是在资源受限的企业环境中。本文提出了TITAN-FedAnil+,一种面向智能企业中区块链联邦学习的基于信任的自适应网络。所提出的框架引入了基于亲和传播的自适应聚类聚合,无需预先知道攻击者数量即可识别并过滤恶意更新。此外,采用GPU加速向量化以提高计算效率,同时通过有符号状态跳变机制实现轻量级区块链重同步。实验结果表明,与基线框架相比,在受限的8 GB边缘设备上经过50轮通信,内存开销显著降低,节省高达81%。结果表明,TITAN-FedAnil+有效提升了智能企业环境中安全联邦学习部署的鲁棒性、可扩展性和资源效率。

英文摘要

Federated Learning (FL) has emerged as an effective paradigm for collaborative intelligence while preserving data privacy. However, data heterogeneity arising from non-IID distributions and decentralized security threats remain significant challenges, particularly in resource-constrained enterprise environments. This paper presents TITAN-FedAnil+, a Trust-Based Adaptive Network for blockchain-enabled federated learning in intelligent enterprises. The proposed framework introduces affinity propagation-based adaptive clustered aggregation to identify and filter malicious updates without requiring prior knowledge of the number of attackers. In addition, GPU-accelerated vectorization is employed to improve computational efficiency, while a signed state jump mechanism enables lightweight blockchain resynchronization. Experimental results demonstrate substantial reductions in memory overhead, achieving up to 81% savings across 50 communication rounds on constrained 8 GB edge devices compared with the baseline framework. The results indicate that TITAN-FedAnil+ effectively improves robustness, scalability, and resource efficiency for secure federated learning deployments in intelligent enterprise environments.

2606.04387 2026-06-04 cs.IR cs.AI

Rethinking Sales Lead Scoring with LLM-based Hierarchical Preference Ranking

重新思考基于LLM的分层偏好排名的销售线索评分

Chenyu Zhang, Yiwen Liu, Yin Sun, Xinyuan Zhang, Yuji Cao, Junming Jiao, Juyi Qiao

发表机构 * Intelligent Business Team, Li Auto Inc.(李自动公司智能商务团队)

AI总结 针对高价值领域销售线索转化问题,提出基于LLM的判别式框架HPRO,通过分层偏好排名优化联合建模结构化与非结构化数据,实现评分与排名性能提升。

详情
AI中文摘要

在高价值领域(如汽车、房地产)中,销售线索转化与电子商务推荐有根本不同,因为其决策周期长且涉及多阶段漏斗。传统的线索评分方法(基于规则的评分卡、机器学习或逐点CTR模型)面临严重挑战:监督信号稀疏、非结构化CRM日志中的语义鸿沟,以及无法捕捉线索的相对优先级。虽然大型语言模型(LLM)能够对客户交互提供卓越的语义理解,但通用LLM不适合线索排名:它们生成文本而非可比较的分数,并且缺乏与销售漏斗分层优先级的一致性。我们提出了一种基于LLM的判别式框架用于销售线索评分,该框架支持结构化CRM特征和非结构化客户交互的联合建模。在此框架之上,我们提出了HPRO(分层偏好排名优化),通过分层偏好排名目标增强销售线索评分。HPRO采用边际感知的Bradley-Terry公式,将稀疏的二元标签转换为密集的、漏斗感知的偏好对,使线索评分能够同时利用逐点和成对监督。在来自领先新能源汽车品牌的大规模数据上的实验表明,分类性能达到最先进水平(AUC 0.8161),排名性能提升(排名靠前线索的精确度提高39.7%)。为期132天的在线A/B测试验证了9.5%的销量提升,确认了实际的商业影响。

英文摘要

Sales lead conversion in high-stakes domains (e.g., automotive, real estate) differs fundamentally from e-commerce recommendation due to prolonged decision cycles and multi-stage funnels. Traditional lead scoring methods rule-based scorecards, machine learning, or pointwise CTR models face severe challenges: sparse supervision, a semantic gap in unstructured CRM logs, and inability to capture relative lead priority. While Large Language Models(LLMs) offer superior semantic understanding of customer interactions, general-purpose LLMs are ill-suited for lead ranking: they generate text rather than comparable scores, and lack alignment with the hierarchical priorities of sales funnels. We introduce an LLM-based discriminative framework for sales lead scoring, which supports joint modeling of structured CRM features and unstructured customer interactions. On top of this framework, we propose HPRO (Hierarchical Preference Ranking Optimization), which augments sales lead scoring with a hierarchical preference ranking objective. HPRO employs a margin-aware Bradley-Terry formulation to transform sparse binary labels into dense, funnel-aware preference pairs, enabling lead scoring to leverage both pointwise and pairwise supervision. Experiments on large-scale data from a leading NEV brand demonstrate state-of-the-art classification (AUC 0.8161) and ranking performance (+39.7% precision among top-ranked leads). A 132-day online A/B test validates 9.5% sales volume uplift, confirming real-world commercial impact.

2606.04382 2026-06-04 cs.DL cs.AI cs.IR

LCSHBench: A Multilingual, Consensus-Grounded Benchmark for Library of Congress Subject Heading Assignment

LCSHBench:一个多语言、共识基础的国会图书馆主题标目分配基准

Kwok Leong Tang

发表机构 * Library of Congress(国会图书馆)

AI总结 提出LCSHBench基准,基于多图书馆共识构建多语言书目记录集,通过精确匹配和概念匹配评估自动主题编目,并展示低秩微调嵌入器在跨语言检索中的改进。

详情
AI中文摘要

自动主题编目为书目记录分配受控词汇标目,但LCSH缺乏标准的公开基准。我们引入LCSHBench:来自哈佛、哥伦比亚和普林斯顿开放许可目录的15种语言的22,346本书。只有当至少两个独立编目机构分配了LCSH时,记录才被纳入;我们发布每个目录的来源以及联合和一致答案视图。对465,187部由三个图书馆编目的作品进行的一致性研究显示了这种设计的重要性:图书馆通常在底层主题上达成一致(93.3%共享概念级标目),但在确切表达上经常不同(39.4%具有相同的标目集)。因此,LCSHBench通过开放词汇生成和全词汇检索,使用按语言和标目类型分解的集合和排名指标,对精确匹配和概念匹配进行评分。作为首次演示,对300M设备端嵌入器的低秩微调改进了跨语言检索,并在开发集上的精确召回率@200(0.659 vs 0.623)超过了3,072维托管嵌入器。语言面板显示增益并不均匀,保留测试和端到端确认仍是未来工作。

英文摘要

Automated subject cataloging assigns controlledvocabulary headings to bibliographic records, but LCSH has no standard public benchmark. We introduce LCSHBench: 22,346 books in 15 languages from the openly licensed Harvard, Columbia, and Princeton catalogs. Records enter only when at least two independent cataloging agencies assigned LCSH; we release per-catalog provenance plus union and unanimous answer views. A concordance study of 465,187 works cataloged by all three libraries shows why this design matters: libraries usually agree on the underlying topic (93.3% share a concept-level heading) but often differ in exact expression (39.4% have identical heading sets). LCSHBench therefore scores both exact and concept matches, with set and rank metrics broken down by language and heading type, across open-vocabulary generation and full-vocabulary retrieval. As a first demonstration, a low-rank fine-tune of a 300M on-device embedder improves cross-lingual retrieval and beats a 3,072-dimensional hosted embedder on development exact recall@200 (0.659 vs 0.623). The language panel shows the gain is not uniform, and held-out-test and end-to-end confirmation remain future work.

2606.04380 2026-06-04 stat.ML cs.LG

REGAIN: REconciliation GAIN-driven Auxiliary Direction Learning

REGAIN:基于调和增益的辅助方向学习

Weijia Li, Shun Hu, Yanfei Kang

发表机构 * School of Mathematical Sciences, Beihang University, Beijing, China(北京航空航天大学数学科学学院) School of Economics and Management, Beihang University, Beijing, China(北京航空航天大学经济管理学院)

AI总结 提出REGAIN框架,通过学习归一化辅助方向并利用冻结预测预言机,基于目标加权损失减少选择方向,以改进预测调和。

详情
AI中文摘要

预测调和通常从固定测量系统开始,询问如何将预测投影到一致空间。我们提出不同问题:哪些额外的线性测量应被预测并纳入调和系统?我们提出REGAIN,一种调和增益框架,学习归一化辅助方向,用冻结预测预言机预测诱导序列,并通过增强广义最小二乘调和后的目标加权损失减少选择方向。与基于方差的分量或基于可预测性的辅助选择不同,REGAIN优化辅助测量对最终调和预测的下游影响。我们提供统计特征,表明有用的辅助方向必须提供关于未解决目标不确定性的互补信息,而不仅仅是易于预测。分析还阐明了协方差风险减少机制、偏差变化在实现二次风险中的作用以及估计增益信号的稳定性。开发了带有保留增益筛选的分阶段学习算法,以及可选的联合优化步骤。在北京PM2.5和澳大利亚旅游数据上的实验表明,增益选择的测量可以改进普通多变量和层次预测,特别是当它们揭示原始测量系统未捕捉的残差不确定性时。

英文摘要

Forecast reconciliation usually starts from a fixed measurement system and asks how forecasts should be projected onto a coherent space. We ask a different question: which additional linear measurements should be forecast and included in the reconciliation system? We propose REGAIN, a reconciliation-gain framework that learns normalized auxiliary directions, forecasts the induced series with a frozen forecasting oracle, and selects directions by their target-weighted loss reduction after augmented generalized least-squares reconciliation. Unlike variance-based components or predictability-based auxiliary selection, REGAIN optimizes the downstream effect of an auxiliary measurement on the final reconciled forecasts. We provide a statistical characterization showing that useful auxiliary directions must provide complementary information about unresolved target uncertainty, rather than merely being easy to forecast. The analysis also clarifies the covariance-risk reduction mechanism, the role of bias changes in realized quadratic risk, and the stability of estimated gain signals. A stagewise learning algorithm with held-out gain screening is developed, together with an optional joint refinement step. Experiments on Beijing PM2.5 and Australian Tourism data show that gain-selected measurements can improve both ordinary multivariate and hierarchical forecasts, especially when they reveal residual uncertainty not captured by the original measurement system.

2606.04374 2026-06-04 cs.IR cs.AI

DSIRM: Learning Query-Bridged Discrete Semantic Identifiers for E-commerce Relevance Modeling

DSIRM:学习查询桥接的离散语义标识符用于电商相关性建模

Bokang Wang, Xing Fang, Mingmin Jin, Jing Wang, Zhentao Song, Guangxin Song, Jianbo Zhu

发表机构 * Taobao & Tmall Group of Alibaba(淘宝与天猫集团)

AI总结 针对电商搜索中连续嵌入难以捕捉细粒度属性区分的问题,提出查询桥接对比量化的离散语义标识符相关性模型(DSIRM),通过注入查询-物品交互监督学习语义感知分区,并利用生成式大语言模型预测物品标识符,显著提升相关性建模效果。

Comments Jing Wang (Corresponding Author)

详情
AI中文摘要

尽管连续嵌入在电商搜索相关性方面取得了快速进展,但一个长期存在的难题是难以捕捉细粒度的属性区分。虽然离散语义标识符(SIDs)已被广泛采用作为有前景的替代方案,但现有的SID生成方法严重依赖无监督量化。在现实场景中,缺乏显式监督通常使得更难决定哪些物品应共享一个SID,导致查询依赖排序的能力有限。为了解决无监督SID的问题,我们提出显式建模离散相关性特征,并开发了离散语义标识符相关性模型(DSIRM)。具体而言,我们在物品侧提出了一种查询桥接的对比量化方法,将查询-物品交互监督注入残差量化中,以主动学习相关性感知的语义分区。另一方面,我们在查询侧探索生成式大语言模型,从文本中显式预测物品SID,解决长尾查询和意图模糊问题。查询和物品SID之间的层次前缀匹配产生了具有判别力的特征,完美补充了密集信号。在天猫生产数据上的大量实验结果表明,我们提出的方法取得了更好的结果,离线AUC提升了+1.54%。通过高效的混合架构部署,它实现了显著的在线提升(UCTR +0.13%,UCTCVR +0.25%),证明了其巨大的工业价值。

英文摘要

Despite rapid progress of continuous embeddings for e-commerce search relevance, a long-standing open problem is the difficulty in capturing fine-grained attribute distinctions. While discrete Semantic Identifiers (SIDs) have been widely adopted as a promising alternative, existing SID generation methods rely heavily on unsupervised quantization. In realistic scenarios, the lack of explicit supervision often makes it more difficult to dictate which items should share an SID, resulting in limited capability for query-dependent ranking. To address the issue of unsupervised SIDs, we propose to explicitly model discrete relevance features and develop a Discrete Semantic Identifier Relevance Model (DSIRM). Specifically, we present a query-bridged contrastive quantization approach on the item side, injecting query-item interaction supervision into Residual Quantization to actively learn relevance-aware semantic partitions. On the other hand, we explore generative LLMs on the query side to explicitly predict item SIDs from text, resolving tail queries and intent ambiguity. Hierarchical prefix matching between query and item SIDs yields discriminative features that perfectly complement dense signals. Extensive experimental results on Tmall's production data show that our proposed approach has achieved better results, improving offline AUC by +1.54\%. Deployed via an efficient hybrid architecture, it achieves significant online lifts (+0.13\% UCTR, +0.25\% UCTCVR), proving its massive industrial value.

2606.04370 2026-06-04 eess.AS cs.SD eess.SP

Masked Wavelet Scattering Transform Neural Field for Sound Field Reconstruction

掩蔽小波散射变换神经场用于声场重建

Xinmeng Luan, Samuel A. Verburg, Efren Fernandez-Grande, Gary Scavone

发表机构 * Fonds de recherche du Québec – Nature et technologies(魁北克自然与技术研究基金)

AI总结 提出一种利用小波散射变换作为多尺度特征提取器,结合神经场优化和掩蔽策略,实现稀疏观测下声场重建的方法,并在HRTF上采样中验证有效性。

Comments 5 pages, 2 figures, conference

详情
AI中文摘要

在本文中,我们提出了一种重建框架,利用小波散射变换(WST)作为多尺度特征提取器,在稀疏观测条件下施加统计先验。重建问题被表述为一个优化任务,并使用神经场求解,将WST纳入训练损失函数。作为概念验证,我们在HRTF上采样上验证了所提出的方法。对WST系数应用掩蔽策略,形成两阶段过程。第一阶段从小的多受试者数据集中学习二元掩码,第二阶段将学习到的掩码应用于单个HRTF的WST系数,以在重建过程中保留信息性统计结构。与基线方法的验证(同时也作为框架不同组件的消融研究)证明了所提出方法的有效性。

英文摘要

In this paper, we propose a reconstruction framework that leverages the Wavelet Scattering Transform (WST) as a multi-scale feature extractor to impose statistical priors under sparse observation conditions. The reconstruction problem is formulated as an optimization task and solved using a neural field, with the WST incorporated into the training loss function. As a proof of concept, we validate the proposed method on HRTF upsampling. A masking strategy is applied to the WST coefficients, resulting in a two-phase procedure. The first phase learns a binary mask from a small multi-subject dataset, while the second phase applies the learned mask to the WST coefficients of an individual HRTF to preserve informative statistical structures during reconstruction. Validation against baseline methods, which also serve as an ablation study of the different components of the framework, demonstrates the effectiveness of the proposed approach.

2606.04362 2026-06-04 cs.IR cs.CL

Disentangling Answer Engine Optimization from Platform Growth: A Log-Based Natural Experiment on ChatGPT Referral Traffic

解耦答案引擎优化与平台增长:基于日志的ChatGPT推荐流量自然实验

Keisuke Watanabe, Kazuki Nakayashiki

发表机构 * Glasp Inc.(Glasp公司)

AI总结 本研究通过自然实验方法,利用同一域内未处理页面作为对照,分离了答案引擎优化(AEO)对推荐流量的因果效应与平台自身增长带来的混淆效应。

Comments 9 pages, 4 figures, 1 table

详情
AI中文摘要

大型语言模型(LLM)“答案引擎”(如ChatGPT)现在向开放网络发送可测量的推荐流量,一种类似于搜索引擎优化的实践——此处称为答案引擎优化(AEO)——已经出现。公开的AEO成功案例通常引用巨大的原始增长倍数,但原始推荐增长被答案引擎本身的快速平台级增长所混淆。我们报告了一项针对单个高流量域名(glasp.co)的纵向现场研究,该域名拥有数十万个YouTube问答页面,在2026年1月接受了一组明确的AEO干预(详见第4节)。由于干预集中在网站的一个子集上,同一域内未处理的剩余部分作为同期对照,吸收了平台尾风。使用第一方分析和服务器日志而非概率性第三方估计,我们发现:(1)原始增长由平台尾风主导:在月度汇总中,ChatGPT总推荐量增长了5.7倍,而同一域内未处理页面在同一时间段内增长了3.5倍;(2)对每周处理/对照比率的中断时间序列模型估计出一个离散的、与干预对齐的水平增长1.82倍(95% CI 1.31-2.54,HAC p=0.001),该结果在参与度过滤流量(2.27倍)和替代规格下稳健;(3)然而,保守的安慰剂时间置换检验得出p=0.16,因此该效应是提示性的而非结论性的,鉴于前期短且噪声大;(4)Google对处理页面的自然点击并未超出整体网站趋势下降,且索引得以保留,这与SEO保护规则一致。方法论上的信息——通过域内对照分离处理与平台尾风——比任何单一倍数更重要,并意味着标题中的AEO倍数大大高估了因果效应。

英文摘要

Large language model (LLM) "answer engines" such as ChatGPT now send measurable referral traffic to the open web, and a practice analogous to search engine optimization, here called Answer Engine Optimization (AEO), has emerged. Public AEO success stories typically quote large raw growth multiples, but raw referral growth is confounded by the rapid platform-level growth of the answer engines themselves. We report a longitudinal field study on a single high-traffic domain (glasp.co) whose corpus of hundreds of thousands of YouTube question-and-answer pages received a defined bundle of AEO interventions in January 2026 (detailed in Section 4). Because the interventions were concentrated on one subset of the site, the untreated remainder of the same domain acts as a contemporaneous control that absorbs the platform tailwind. Using first-party analytics and server logs rather than probabilistic third-party estimators, we find: (1) raw growth is dominated by the platform tailwind: on monthly aggregates total ChatGPT referrals grew 5.7x while untreated pages on the same domain grew 3.5x over the same window; (2) an interrupted time-series model on the weekly treated/control ratio estimates a discrete, intervention-aligned level increase of 1.82x (95% CI 1.31-2.54, HAC p=0.001), robust across engagement-filtered traffic (2.27x) and alternative specifications; (3) however, a conservative placebo-in-time permutation test yields p=0.16, so the effect is suggestive, not conclusive, given a short and noisy pre-period; and (4) Google organic clicks to treated pages did not fall beyond the ambient site-wide trend and indexation was preserved, consistent with the SEO-protection rule. The methodological message, separating treatment from platform tailwind with an on-domain control, matters more than any single multiple, and implies that headline AEO multiples substantially overstate causal effect.

2606.04361 2026-06-04 eess.SY cs.MA cs.RO cs.SY math.DS math.OC

When Freshness Is Not Enough: Distribution-Aware Age of Information for Networked LQR Control

当新鲜度不足时:面向网络化LQR控制的分布感知信息年龄

Abdullah Y. Etcibasi, C. Emre Koksal, Eylem Ekici

发表机构 * Department of Electrical and Computer Engineering, The Ohio State University(电气与计算机工程系,俄亥俄州立大学)

AI总结 本文研究网络化控制系统中,仅最小化平均信息年龄(AoI)不足以优化LQR跟踪性能,需考虑调度间隔的完整分布(包括高阶矩和指数矩)。

详情
AI中文摘要

信息年龄(AoI)已成为无线更新系统设计的核心指标,尤其是在新鲜测量支持跟踪、估计和控制的场景中。尽管其广泛应用,但将平均AoI或峰值AoI作为闭环性能的替代指标通常基于直觉而非控制理论推导。本文探讨了最小化平均AoI是否对网络化控制系统最优。对于具有延迟间歇更新的标量线性时不变系统,我们证明,在状态无关调度策略下,无限时域LQR跟踪问题可简化为对调度间隔分布的优化。所得目标函数依赖于调度过程的高阶统计矩,在不稳定或相关情况下还依赖于指数矩,而非仅依赖于其均值。因此,具有相同平均AoI的策略可能产生显著不同的跟踪成本。我们进一步将分析扩展到具有指数衰减自相关的扰动,并推导出揭示完整间隔分布作用的等效成本公式。最后,使用NGSIM US-101数据集中的真实车辆轨迹验证理论。实证结果与预测的性能趋势一致,表明仅凭平均AoI不足以进行面向控制的网络设计。

英文摘要

Age of Information (AoI) has become a central metric for the design of wireless update systems, especially in applications where fresh measurements support tracking, estimation, and control. Despite its popularity, the use of mean AoI or peak AoI as a surrogate for closed-loop performance is often motivated by intuition rather than by a control-theoretic derivation. This paper examines whether minimizing the mean AoI is in fact optimal for networked control systems. For scalar linear time-invariant systems with delayed intermittent updates, we show that, under state-independent scheduling policies, the infinite-horizon LQR tracking problem reduces to an optimization over the distribution of inter-scheduling intervals. The resulting objective depends on higher-order statistical moments, and in unstable or correlated regimes on exponential moments, of the inter-scheduling process rather than only on its mean. Consequently, policies with identical mean AoI can induce substantially different tracking costs. We further extend the analysis to disturbances with exponentially decaying autocorrelation and derive equivalent cost formulations that expose the role of the full interval distribution. Finally, we validate the theory using real vehicle trajectories from the NGSIM US-101 dataset. The empirical results match the predicted performance trends, demonstrating that mean AoI alone is insufficient for control-oriented network design.

2606.04329 2026-06-04 cs.CR cs.AI

From Untrusted Input to Trusted Memory: A Systematic Study of Memory Poisoning Attacks in LLM Agents

从不可信输入到可信内存:LLM智能体中内存投毒攻击的系统研究

Pritam Dash, Tongyu Ge, Aditi Jain, Tanmay Shah, Zhiwei Shang

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 本文系统研究了基于LLM的智能体中的内存投毒攻击,识别了四种内存写入通道和九种结构漏洞,提出了六类攻击的分类法,并设计了评估基准MPBench,发现更积极读写内存的智能体更易被利用,且现有提示注入防御无法覆盖内存投毒攻击。

详情
AI中文摘要

内存是AI智能体的核心组件,使其能够在交互中积累知识并提高性能。然而,持久性内存引入了内存投毒的风险,即单个对抗性内存写入可以对智能体行为产生长期影响。我们对基于LLM的智能体中的内存投毒进行了系统研究。我们识别了四种内存写入通道和九种模型能力、系统提示设计以及智能体系统架构中的结构漏洞,这些漏洞使得这些通道可被利用。基于这些漏洞,我们提出了六类内存投毒攻击的分类法。此外,我们设计了MPBench——一个用于评估内存投毒攻击的基准,并表明设计为更积极读写和检索内存的智能体更容易被利用。我们还表明,现有的提示注入防御无法覆盖内存投毒攻击。我们的发现为理解和缓解针对AI智能体的内存投毒攻击提供了基础。

英文摘要

Memory is a core component of AI agents, enabling them to accumulate knowledge across interactions and improve performance. However, persistent memory introduces the risk of memory poisoning, where a single adversarial memory write can exert long-term influence over agent behavior. We present a systematic study of memory poisoning in LLM-based agents. We identify four memory write channels and nine structural vulnerabilities in model capabilities, system prompt design, and agent system architecture that make these channels exploitable. Based on these vulnerabilities, we develop a taxonomy of six classes of memory poisoning attacks. Furthermore, we design MPBench -- a benchmark for evaluating memory poisoning attacks, and show that agents designed to write and retrieve memory more aggressively are more exploitable. We also show that existing prompt injection defenses fail to cover memory poisoning attacks. Our findings provide a foundation for understanding and mitigating memory poisoning attacks against AI agents.

2606.04328 2026-06-04 cs.NI cs.AI

Generalizable Multi-Task Learning for Wireless Networks Using Prompt Decision Transformers

基于提示决策变压器的无线网络可泛化多任务学习

Fatih Temiz, Shavbo Salehi, Melike Erol-Kantarci

发表机构 * IEEE University of California, Berkeley(加州大学伯克利分校)

AI总结 提出PromptDT框架,将多小区选择重构为序列建模问题,利用离线轨迹和任务特定提示实现跨异构网络配置的可扩展学习,在无需重训练的情况下提升多任务QoE达49%。

Comments Accepted paper at IEEE International Mediterranean Conference on Communications and Networking (MeditCom) 2026

详情
AI中文摘要

未来无线网络需要快速适应高度异构的环境和动态任务配置,这要求从传统的基于规则和优化的无线资源管理(RRM)转向人工智能(AI)驱动的RRM。AI驱动的方法可以学习复杂的非线性关系,泛化到不同的网络条件,并实现实时、可扩展和自主的决策。在RRM技术中,协调多点(CoMP)传输对于减轻小区间干扰和提升小区边缘性能至关重要,从而在密集部署中改善体验质量(QoE)。然而,最优多小区选择仍然是一个复杂的组合挑战,因为它需要在动态流量和信道条件下联合优化许多可能的服务小区组合。尽管取得了成功,但传统的深度强化学习(DRL)方法,如近端策略优化(PPO),在状态和动作空间变化时存在样本效率低、泛化能力有限和重新训练成本高的问题。为了解决这些瓶颈,我们提出了一种基于提示决策变压器(PromptDT)的多任务学习框架,该框架能够跨不同网络配置学习,并将多小区选择重构为序列建模问题。通过利用离线轨迹和任务特定提示,PromptDT实现了跨不同网络配置(包括变化的基站和用户设备数量以及调度策略)的可扩展学习。实验结果表明,与基线相比,PromptDT在多任务设置中将QoE提高了高达49%,且性能随模型容量正向扩展。此外,PromptDT能有效泛化到未见过的任务,实现对新网络配置的鲁棒少样本适应,无需重新训练或微调。

英文摘要

Future wireless networks demand rapid adaptation to highly heterogeneous environments and dynamic task configurations, necessitating a shift from conventional rule-based and optimization-driven radio resource management (RRM) toward artificial intelligence (AI)-driven RRM. AI-driven approaches can learn complex nonlinear relationships, generalize across diverse network conditions and enable real-time, scalable and autonomous decision-making. Among RRM techniques, coordinated multipoint (CoMP) transmission is pivotal for mitigating inter-cell interference and enhancing cell-edge performance, thereby improving quality of experience (QoE) in dense deployments. However, optimal multi-cell selection remains a complex combinatorial challenge as it requires jointly optimizing over many possible serving-cell combinations under dynamic traffic and channel conditions. Despite their success, conventional deep reinforcement learning (DRL) methods such as proximal policy optimization (PPO) suffer from poor sample efficiency, limited generalization, and costly retraining when state and action spaces change. To address these bottlenecks, we propose a Prompt Decision Transformer (PromptDT) based multi-task learning framework capable of learning across diverse network configurations and reformulating multi-cell selection as a sequence modeling problem. By leveraging offline trajectories and task-specific prompts, PromptDT enables scalable learning across diverse network configurations, including varying base stations and user equipment counts, and scheduler policies. Experimental results demonstrate that PromptDT improves QoE by up to 49% in multi-task settings compared to baselines, with performance scaling positively alongside model capacity. Moreover, PromptDT generalizes effectively to unseen tasks, achieving robust few-shot adaptation to new network configurations without retraining or fine-tuning.

2606.04319 2026-06-04 cs.GR cs.CV

PureLight: Learning Complex Luminaires with Light Tracing

PureLight: 使用光线追踪学习复杂光源

Pedro Figueiredo, Zixuan Li, Beibei Wang, Miloš Hašan, Nima Khademi Kalantari

发表机构 * Texas A&M University(德克萨斯大学) Nankai University(南开大学) Nanjing University of Science and Technology(南京理工大学) NVIDIA(NVIDIA公司)

AI总结 提出一种基于神经网络的公式,通过光线追踪和归一化流网络学习复杂光源的辐射分布,并蒸馏为轻量级MLP以实现高效渲染。

Comments 9 pages, 10 figures

详情
AI中文摘要

我们提出了一种神经公式来估计复杂光源的外观。我们专注于具有复杂光传输(例如,被多个镜面层包围的小型发射器)的具有挑战性的光源,这些光源对于(双向)路径追踪来说很难处理。为此,我们使用光线追踪从发射器构建路径到出射表面,并将外观估计公式化为一个分布学习问题。具体来说,我们使用一个大型归一化流网络对出射表面上的出射辐射概率密度函数(pdf)进行建模,并将出射辐射恢复为估计的pdf与通量的乘积。为了实现高效推理,我们将学习到的外观蒸馏到一个轻量级MLP中,该MLP直接估计出射表面上的辐射。我们还训练了一个采样网络用于从光源进行有效的直接照明计算,以及一个混合网络将光源合成到场景中。我们的公式使得在任意场景中使用低样本数渲染具有挑战性的光源成为可能。

英文摘要

We propose a neural formulation for estimating the appearance of complex luminaires. We focus on challenging luminaires with complex light transport (e.g., small emitters enclosed by multiple specular layers) that are difficult for (bidirectional) path tracing. To this end, we use light tracing to construct paths from emitters to the exit surfaces and formulate appearance estimation as a distribution learning problem. Specifically, we model the probability density function (pdf) of outgoing radiance on the exit surfaces using a large normalizing flow network, and recover the outgoing radiance as the product of the estimated pdf and flux. To enable efficient inference, we distill the learned appearance into a lightweight MLP that directly estimates radiance on the exit surfaces. We additionally train a sampling network for effective direct illumination computation from the luminaire, and a blending network to composite the luminaire into the scene. Our formulation makes it feasible to render challenging luminaires using low sample counts in arbitrary scenes.

2606.04317 2026-06-04 cs.CR cs.LG cs.SE

Toward a Generalized Defense Across Sparse, Continuous, and Structured Parameter Attacks

面向稀疏、连续和结构化参数攻击的通用防御

Bin Duan, Zeyu Bai, Guowei Yang

发表机构 * School of Electrical Engineering and Computer Science, The University of Queensland, Australia(电气工程与计算机科学学院,昆士兰大学,澳大利亚)

AI总结 提出 ParDef 框架,通过密钥通道重参数化、QC-LDPC 量化和自适应鲁棒推理,实现对多种参数攻击的通用防御,在保持高性能的同时降低攻击成功率。

详情
AI中文摘要

深度神经网络越来越多地部署在异构和部分不可信的环境中,模型通过云存储、CI/CD 流水线、容器化服务和边缘执行平台进行分发。这种广泛的部署场景使模型参数面临各种完整性风险。与输入空间对抗攻击不同,参数攻击直接篡改模型的内部参数,并持续影响所有后续推理。现有防御要么需要重新训练,要么导致显著的精度下降,或者仅限于特定的攻击类别。然而,在实际部署场景中,参数攻击的形式往往不可预测。为了解决这一挑战,我们提出了 ParDef,一种针对深度神经网络面向多种类型参数攻击的通用防御。ParDef 集成了密钥通道重参数化(隐藏敏感参数方向)、QC-LDPC 量化(嵌入冗余并支持纠错)以及自适应鲁棒推理(在不确定性下稳定预测)。我们在 CIFAR-10、CIFAR-100 和 Tiny-ImageNet 上使用 ResNet 和 VGG 模型的评估表明,ParDef 在不同参数攻击下持续降低攻击成功率,同时保持较高的模型性能,且仅引入适度的部署开销。这些结果凸显了 ParDef 是一种实用且通用的 DNN 部署防御方案。

英文摘要

Deep neural networks are increasingly deployed across heterogeneous and partially untrusted environments, where models are distributed through cloud storage, CI/CD pipelines, containerized services, and edge execution platforms. This broad deployment landscape exposes model parameters to various integrity risks. Unlike input-space adversarial attacks, parameter attacks directly tamper with the model's internal parameters and persist across all subsequent inferences. Existing defenses either require retraining, incur significant accuracy degradation, or are limited to specific attack classes. However, in real-world deployment scenarios, the forms of parameter attacks are often unpredictable. To address this challenge, we present ParDef, a generalized defense for deep neural networks against diverse types of parameter attacks. ParDef integrates keyed channel reparameterization, which obscures sensitive parameter directions, QC-LDPC quantization, which embeds redundancy and supports error correction, and adaptive robust inference, which stabilizes predictions under uncertainty. Our evaluation on CIFAR-10, CIFAR-100, and Tiny-ImageNet using ResNet and VGG models demonstrates that ParDef consistently reduces attack success rates across different parameter attacks while maintaining high model performance and incurring only moderate deployment overhead. These results highlight that ParDef is a practical and generalized defense for DNN deployments.

2606.04298 2026-06-04 cs.NI cs.AI

Anycast Performance in Context

上下文中的任播性能

Eric Liang

发表机构 * Oracle

AI总结 本文通过比较根DNS和CDN中的任播延迟,提出了一种区分弹性驱动和延迟驱动目标的优化框架,并得出结论:运营商不应使用相同的目标函数优化根DNS和CDN任播。

详情
AI中文摘要

IP任播允许一个服务从多个物理站点通告一个地址,让BGP将每个客户端映射到一个站点。它是DNS根服务器系统、公共解析器和一些内容分发网络的核心,然而相同的路由机制在不同应用中有着截然不同的后果。本文比较了两种设置中的任播延迟:根DNS(其中递归缓存将根服务器延迟分摊到许多用户和长生存时间值上)和CDN(其中每次额外的往返直接影响页面加载、视频启动或API延迟)。综合发现,根DNS任播可能表现出显著的路径膨胀,但仍产生有限的用户可见延迟,而CDN任播需要主动工程化对等互联、路由策略、吸引范围和测量反馈以保持膨胀较小。本文贡献了一个比较延迟模型、一个可复现的测量设计以及一个将弹性驱动的任播目标与延迟驱动的目标分开的优化框架。核心结论是实用的:运营商不应使用相同的目标函数优化根DNS和CDN任播。对于根DNS,鲁棒性、可达性和缓存行为占主导地位;对于CDN服务,尾部延迟、吸引正确性和策略控制占主导地位。

英文摘要

IP anycast lets a service advertise one address from many physical sites, leaving BGP to map each client to a site. It is central to the DNS root server system, public resolvers, and some content delivery networks, yet the same routing mechanism has very different consequences across applications. This paper compares anycast latency in two settings: root DNS, where recursive caching amortizes root-server delay over many users and long time-to-live values, and CDNs, where each additional round trip can directly affect page-load, video-start, or API latency. The synthesis finds that root DNS anycast can exhibit substantial path inflation while still producing limited user-visible delay, whereas CDN anycast requires active engineering of peering, route policy, catchment scope, and measurement feedback to keep inflation small. The paper contributes a comparative latency model, a reproducible measurement design, and an optimization framework that separates resilience-driven anycast objectives from latency-driven objectives. The central conclusion is practical: operators should not optimize root DNS and CDN anycast with the same objective function. For root DNS, robustness, reachability, and cache behavior dominate; for CDN services, tail latency, catchment correctness, and policy control dominate.

2606.04266 2026-06-04 cs.CR cs.LG

Long-Term and Short-Term Transistor Aging in Deep Neural Networks: Impact and Mitigation

深度神经网络中的长期与短期晶体管老化:影响与缓解

Alireza Sarmadi, Virinchi Roy Surabhi, Prashanth Krishnamurthy, Hussam Amrouch, Ramesh Karri, Farshad Khorrami

发表机构 * Dept. of Electrical and Computer Engineering, New York University (NYU) Tandon School of Engineering(纽约大学电气与计算机工程系(Tandon工程学院)) School of Computation, Information and Technology, Technical University of Munich (TUM)(慕尼黑技术大学计算、信息与技术学院)

AI总结 本文研究了长期和短期晶体管老化对深度神经网络推理精度的影响,并提出了一种老化感知重训练方法来缓解性能下降。

Comments 28 pages, 16 figures

详情
AI中文摘要

深度神经网络(DNN)被用于各种实际应用,例如图像分类和语音识别。在集成电路(IC)的硬件上实现的DNN的推理精度会在晶体管老化等现象下下降。老化会减慢晶体管的开关速度,由于时钟无法维持而导致系统级时序违规。为了在整个预期寿命内保持可靠性,设计人员添加保护带以防止时序违规;然而,添加大的时序保护带会导致性能(速度或吞吐量)损失。本章详细讨论了长期和短期晶体管老化对DNN推理精度的影响。此外,为了减轻老化对DNN精度的影响并控制它们,提出了一种老化感知重训练方法,以生成即使在激进(即小于所需)保护带下也具有弹性的DNN。这提高了DNN在老化引起的退化情况下的推理精度。本章在用于图像分类的DNN硬件实现上,使用现成的图像数据集讨论了这些影响以及缓解策略。还简要讨论了短期老化作为检测集成电路中硬件木马的激励机制的应用。

英文摘要

Deep neural networks (DNNs) are used in a variety of real-world applications including, for example, image classification and speech recognition. The inference accuracy of DNN implemented on hardware in integrated circuits (ICs) degrades under phenomena such as transistor aging. Aging slows down the switching speed of transistors, resulting in system-level timing violations due to unsustainable clocks. To maintain reliability for the entire projected lifetime, designers add guardbands to prevent timing violations; however, adding large timing guardbands causes losses in performance (speed or throughput). This chapter provides a detailed discussion of the effects of long-term and short-term transistor aging on DNN inference accuracy. Furthermore, to mitigate aging effects on DNN's accuracy and keep them at bay, a methodology for aging-aware retraining is presented in order to generate a resilient DNN even when aggressive (i.e., smaller than required) guardbands are used. This improves the inference accuracy of the DNNs even in the presence of aging-induced degradation. These effects are discussed in this chapter along with mitigation strategies on a hardware implementation of a DNN for image classification on an off-the-shelf image dataset. The application of short-term aging as an excitation mechanism for the detection of hardware Trojans in integrated circuits is also briefly discussed.

2606.04265 2026-06-04 math.OC cs.LG cs.NA math.NA

Nonlocal Mean Field Schrödinger Bridge with Learned Interactions

具有学习相互作用的非局部平均场薛定谔桥

Daisuke Inoue, Mathieu Laurière, Dante Kalise

发表机构 * Department of Mathematics, Imperial College London(伦敦帝国学院数学系) Shanghai Frontiers Science Center of Artificial Intelligence and Deep Learning(上海前沿人工智能与深度学习科学中心) NYU-ECNU Institute of Mathematical Sciences, NYU Shanghai(纽约大学上海数学科学研究所)

AI总结 本文提出一种使用神经网络代理近似非局部相互作用的平均场薛定谔桥方法,将推理时的每步计算成本从二次降低到线性,并推导了代理误差传播的稳定性界限。

Comments 31 pages, 15 figures

详情
AI中文摘要

薛定谔桥问题构建一个以最小能量连接初始分布和终端分布的随机过程。本文考虑其平均场扩展,即平均场薛定谔桥,用于相互作用粒子系统。对于非局部相互作用,评估产生的依赖于粒子的分布项的计算量随种群规模呈二次增长,这使得大规模问题难以处理。我们通过使用神经网络代理近似非局部相互作用来解决这一瓶颈。由此产生的四阶段交替算法将推理时每步成本从种群规模的二次降低到线性。我们还推导了Grönwall型稳定性界限,显示代理误差如何传播到生成的轨迹。在导航和意见动力学任务的数值实验中,所提出的方法再现了通过解析评估获得的轨迹,并减少了训练时间。

英文摘要

The Schrödinger Bridge Problem constructs a stochastic process that connects an initial distribution to a terminal distribution with minimum energy. This work considers its mean-field extension, the Mean-Field Schrödinger Bridge, for interacting particle systems. With nonlocal interactions, evaluating the resulting particle-dependent distributional terms can scale quadratically with the population size, which makes large-scale problems intractable. We address this bottleneck by approximating the nonlocal interactions with neural network surrogates. The resulting four-stage alternating algorithm reduces the per-step cost from quadratic to linear in the population size at inference. We also derive Grönwall-type stability bounds that show how surrogate errors propagate to the generated trajectories. In numerical experiments on navigation and opinion-dynamics tasks, the proposed method reproduces trajectories obtained with analytical evaluation and reduces training time.