arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 4033
2605.08777 2026-05-12 stat.ML cs.LG math.PR

Measuring and Decomposing Mode Separation via the Canonical Diffusion

Shaul Tolkovsky, Ori Meidler, Or Zuk

AI总结 本文研究了密度分布中模式分离的度量问题,即分布如何形成被势垒分隔的簇状结构,这一特性在高维空间中难以量化。作者提出了一种基于密度平稳分布的可逆扩散过程,通过其自协方差矩阵提取两个指标:SSA(平方自相关和)用于衡量势垒敏感的分离程度,DA(主导自相关方向)用于捕捉元稳态结构。该方法仅需样本和分数函数,适用于高维数据,并在合成混合高斯、文本到图像生成和分子动力学等场景中验证了其有效性。

详情
英文摘要

Mode separation, namely how sharply a distribution fragments into barrier-separated clusters, is a fundamental geometric property of densities, difficult to quantify in high dimensions. It is structurally distinct from dispersion, yet existing tools fall short: differential entropy rises with spread regardless of fragmentation, PCA orders directions by variance regardless of barriers, and mutual information requires a mixture decomposition one usually does not have. We measure mode separation through a single stochastic process intrinsic to the density: a unique reversible diffusion with $f$ as its stationary distribution and constant scalar diffusion coefficient. We extract two readouts from its autocovariance matrix: SSA (Sum of Squared Autocorrelations), a scalar barrier-sensitive measure; and DA (Dominant Autocorrelation directions), linear projections ordered by metastability rather than variance. Under an isotropic-Gaussian null, we derive a closed-form spectrum for the empirical autocovariance that generalizes Marchenko--Pastur, with an analytic upper edge that selects the lag at which DA is read off. Both readouts use only samples and a score function, scaling to high dimensions through pretrained score-based generative models via Tweedie's identity. We apply our framework to three settings: (i) synthetic Gaussian mixtures, where SSA tracks mutual information; (ii) SDXL text-to-image generations, where SSA and DA capture structure that entropy and PCA miss; and (iii) molecular dynamics of alanine dipeptide, where DA recovers the known slow backbone dihedrals from static samples alone.

2605.08766 2026-05-12 cs.IR cs.CL

UserGPT Technical Report

Yunyi Xuan, Hao Yi, Fengling Mao, Daye Cai, Leikun Liang, Xingsheng He, Jiangnan Xie, Guoshuai Wang, Yushan Han, Wenwen Guo, Xiaoxiao Xu, Lin Qu

AI总结 本文研究了如何利用大语言模型(LLM)从大规模数字痕迹中生成连贯的用户叙事,以实现更准确和个性化的用户理解。为解决真实行为数据稀缺的问题,作者提出了UserGPT框架,包含用户行为模拟引擎和语义化数据处理模块,并设计了基于课程学习的微调策略,以提升模型对长期行为历史的推理能力。实验表明,UserGPT在生成用户标签和行为摘要任务中表现出色,显著压缩了行为记录的同时保留了关键信息。

详情
英文摘要

Personalized user understanding from large-scale digital traces remains a fundamental challenge. Traditional user profiling methods rely on discriminative models and manual feature engineering to predict discrete attributes, often producing fragmented and logically inconsistent profiles that generalize poorly to long-tail behaviors. In this work, we study a generative paradigm in which large language models (LLMs) summarize long and noisy behavioral histories into coherent narratives that capture nuanced user evolution. Our experiments show that even strong LLMs remain limited in complex and implicit personalization reasoning. We propose UserGPT, a framework for improving LLM-based persona understanding through both attribute generation and summary generation. To address the scarcity of real-world behavioral data, we develop a User Behavior Simulation Engine that produces realistic and complex user trajectories. We further introduce a Data-Centric Semantization module that transforms heterogeneous behavioral logs into structured and semantically coherent inputs, reducing noise and sparsity. On top of this pipeline, we design a curriculum-driven post-training strategy that combines multi-stage Supervised Fine-Tuning (SFT) with Dual-Filter Group Relative Policy Optimization (DF-GRPO) to strengthen reasoning over long behavioral histories. We also construct HPR-Bench, a benchmark for holistic persona reasoning derived from simulated data. On HPR-Bench, UserGPT achieves an Avg@10 score of 0.7325 on tag prediction and an $Acc_{Ex}$ score of 0.7528 on summary generation, while compressing behavioral records by up to 97.9% with critical information preserved. These results demonstrate the effectiveness of UserGPT for holistic persona reasoning and personalized user-agent interaction.

2605.08761 2026-05-12 cs.MA cs.LG

Beyond the All-in-One Agent: Benchmarking Role-Specialized Multi-Agent Collaboration in Enterprise Workflows

Tao Yu, Hao Wang, Changyu Li, Shenghua Chai, Minghui Zhang, Zhongtian Luo, Yuxuan Zhou, Haopeng Jin, Zhaolu Kang, Jiabing Yang, YiFan Zhang, Xinming Wang, Hongzhu Yi, Zheqi He, Jing-Shu Zheng, Xi Yang, Yan Huang, Liang Wang

AI总结 该研究提出了一种名为 EntCollabBench 的新基准,用于评估多智能体在企业工作流中的协作能力。该基准模拟了一个具有权限隔离和角色分工的组织环境,包含六个部门的11个专业化角色,并设计了工作流和审批两个评估子集,强调系统状态修改与基于政策的决策。实验表明,当前大型语言模型代理在端到端协作、任务交接和决策承诺等方面仍存在明显不足,该基准为企业级智能体系统的评估与改进提供了可复现的测试平台。

Comments 45 pages

详情
英文摘要

Large language model (LLM) agents are increasingly expected to operate in enterprise environments, where work is distributed across specialized roles, permission-controlled systems, and cross-departmental procedures. However, existing enterprise benchmarks largely evaluate single agents with broad tool access, while existing multi-agent benchmarks rarely capture realistic enterprise constraints such as role specialization, access control, stateful business systems, and policy-based approvals. We introduce \textsc{EntCollabBench}, a benchmark for evaluating enterprise multi-agent collaboration. \textsc{EntCollabBench} simulates a permission-isolated organization with 11 role-specialized agents across six departments and contains two evaluation subsets: a Workflow subset, where agents collaboratively modify enterprise system states, and an Approval subset, where agents make policy-grounded decisions. Evaluation is based on execution traces, database state verification, and deterministic policy adjudication rather than natural-language response judging. Experiments with representative LLM agents show that current models still struggle with end-to-end enterprise collaboration, especially in delegation, context transfer, parameter grounding, workflow closure, and decision commitment. \textsc{EntCollabBench} provides a reproducible testbed for measuring and improving agent systems intended for realistic organizational environments.

2605.08744 2026-05-12 cs.GR cs.AI cs.LG

MeshFIM: Local Low-Poly Mesh Editing via Fill-in-the-Middle Autoregressive Generation

Dingdong Yang, Jian Liu, Biwen Lei, Haohan Weng, Zhuo Chen, Song Guo, Hao Richard Zhang, Ali Mahdavi Amiri, Chunchao Guo

AI总结 本文提出了一种名为MeshFIM的局部低多边形网格编辑方法,通过条件生成的方式仅重新生成不满意区域,而无需重新生成整个网格,从而节省计算资源并保持其他区域的结构完整性。MeshFIM针对网格编辑中的边界对齐、拓扑保持和区域溢出等挑战,设计了包括边界顶点标记、上下文位置嵌入、扩展上下文宽度等多种关键技术,有效提升了局部编辑的质量与效率。实验表明,MeshFIM在网格修复和整体生成任务中优于多种基线方法,并支持交互式编辑和自动缺陷修复等应用。

详情
英文摘要

Autoregressive (AR) models can generate high-quality low-poly meshes from point clouds, but they still operate in an all-or-nothing manner: when a local region is unsatisfactory, the entire mesh must be regenerated, wasting computation and destroying satisfactory mesh structure elsewhere. We introduce MeshFIM, a Fill-in-the-Middle (FIM) framework that regenerates a target region of a low-poly mesh conditioned on the surrounding context. MeshFIM addresses three mesh-specific challenges: enforcing exact attachment along the exposed boundary, preserving topological order in the context, and suppressing overflow beyond the intended region. It does so with five complementary design choices: boundary vertex markers, context positional embeddings, expanded context width, context augmentation, and a low-poly geometry encoder whose gated subtraction mechanism focuses generation on the missing region by leveraging the difference between the reference surface and the existing mesh. Detailed ablation studies are presented to show the effectiveness of every introduced component. Based on MeshFIM, we demonstrate two applications: interactive brush-based editing and automatic defect repair on low-poly mesh (see Figure 1). Last but not least, experiments show that MeshFIM outperforms a range of baselines in mesh refinement, mesh repair and whole mesh generation plus stitch-back scheme.

2605.08687 2026-05-12 cs.DB cs.AI

PrepBench: How Far Are We from Natural-Language-Driven Data Preparation?

Jingzhe Xu, Rui Wang, Jiannan Wang, Guoliang Li

AI总结 数据准备是数据分析流程中的核心且耗时环节,传统工具依赖图形用户界面进行操作,而近年来大语言模型的发展使得通过自然语言驱动数据准备成为可能。为评估当前基于大语言模型的系统在该方向上的进展,研究提出了PrepBench基准,涵盖交互式消歧、准备代码生成和代码到工作流转换三个核心能力,任务涉及多个领域且步骤复杂,实验表明当前最先进的模型在实现自然语言驱动的数据准备方面仍面临挑战。

详情
英文摘要

Data preparation is a central and time-consuming stage in data analysis workflows. Traditionally, commercial tools have relied on graphical user interfaces (GUIs) to simplify data preparation, allowing users to define transformations through visual operators and workflows. Recent advances in large language models (LLMs) raise the possibility of a paradigm shift toward natural language (NL)-driven data preparation, in which users can specify preparation intents in NL directly. However, it remains unclear how far current LLM-based agents are from this paradigm shift in practice. Existing code generation benchmarks do not capture key characteristics of data preparation, including ambiguous user intents, imperfect real-world data, and the need to translate code into interpretable workflows for validation. To bridge this gap, we present PrepBench, a benchmark designed to evaluate NL-driven data preparation along three core capabilities: interactive disambiguation, prep-code generation, and code-to-workflow translation. We crawl data from the Preppin' Data Challenges, and then extend it into a systematically designed benchmark. The benchmark covers diverse domains, and each task involves 3 to 18 data preparation steps. Nearly half of the tasks require over 100 lines of Python code, and the longest solutions approach 300 lines. Our evaluation shows that, despite recent progress, realizing this paradigm shift remains challenging for state-of-the-art LLMs. PrepBench provides a principled benchmark for measuring this gap and helps identify key challenges toward realizing NL-driven data preparation.

2605.08681 2026-05-12 stat.ML cs.AI cs.LG cs.NA math.NA

Core-Halo Decomposition: Decentralizing Large-Scale Fixed-Point Problems

Haixiang, Yang Xu, Jiefu Zhang, Xudong Wu, Zihan Zhou, Jun He, Jiayu Chen

AI总结 本文研究如何通过分解方法求解大规模固定点方程 $x^\star = \bar{F}(x^\star)$。传统严格分解方法将变量分配给不同代理,但会导致依赖关系被截断,引入结构性偏差。为此,作者提出核心-边缘(Core-Halo)分解方法,将变量的写操作与读操作分离,使每个代理更新自己的核心变量,同时读取重叠的边缘变量,从而忠实实现原固定点问题。实验表明,该方法在保持去中心化优势的同时,性能接近集中式求解。

详情
英文摘要

We study solving large-scale fixed-point equation \(x^\star=\bar F(x^\star)\) with decomposition. Standard strict decomposition assigns each agent a disjoint block and evaluates updates using only owned coordinates. For most operators, however, a block update may depend on variables outside the block. Truncating these dependencies by strict decomposition changes the mean operator and creates structural bias that cannot be removed by more samples, smaller stepsizes, or additional consensus. We therefore propose Core-Halo decomposition, which separates write ownership from read-only evaluation context: each agent updates its own core and reads from an overlapping halo. By aligning the Core-Halo decomposition with the block-dependence structure of $\bar F$, the original fixed-point problem can be implemented faithfully in a decentralized multi-agent system. We further characterize the fundamental obstruction faced by strict decomposition through a Bellman closure condition and a blockwise bias lower bound, showing that local-only updates can alter the original fixed-point operator. Finally, we conduct extensive experiments across a range of application settings, and demonstrate that Core-Halo achieves near-centralized performance while retaining the parallelism benefits of decentralization.

2605.08680 2026-05-12 cs.SE cs.AI cs.LG

Semantic Voting: Execution-Grounded Consensus for LLM Code Generation

Shan Jiang, Zijian Yi, Chenguang Zhu

AI总结 该论文研究了基于大语言模型(LLM)的代码生成过程中如何在没有完整 oracle 的情况下选择最优代码。作者提出了一种基于语义投票(Semantic Voting)的方法,通过在LLM生成的输入上执行候选代码并根据执行结果进行聚类,从而提高选择准确性。实验表明,基于执行的筛选方法在多个配置上显著优于传统的输出模式投票方法,而输入质量是影响性能的最关键因素,使用草图生成输入比直接生成或随机模糊测试效果更好。

详情
英文摘要

LLM code-generation pipelines often sample multiple candidates and select one final answer without access to a complete oracle. Existing pipelines mix textual voting, ranking, and execution-based agreement, but the relative contribution of each component remains unclear. We study 18 configurations across different models, thinking levels, and benchmarks, comparing output-pattern majority voting, weighted voting, MBR-Exec, and SemanticVote - a method that clusters candidates by execution fingerprints on LLM-generated inputs. Three findings emerge. (1) The best execution-based selector exceeds output-pattern majority voting by 19-52 percentage points on every configuration, with every execution-based selector exceeding it by at least 18 points. (2) Once candidates are executed on diverse inputs, aggregation rule has limited effect: SemanticVote, weighted voting, and MBR-Exec are statistically indistinguishable across all 18 configurations. The largest factor is input quality: sketch-based input generation consistently outperforms direct LLM generation by 0.6-2.1 pp and random fuzzing by up to 11.3 pp. (3) Thinking level interacts differently with selection families: deeper thinking improves majority voting by 12 pp but execution-based methods stay flat or degrade as candidate diversity falls. These results frame inference-time code selection as a signal-quality problem rather than an aggregation-rule problem: when oracles are unavailable, the behavioral evidence matters more than the aggregation rule.

2605.08679 2026-05-12 cs.SI cs.AI cs.LG

Attention-based graph neural networks: a survey

Chengcheng Sun, Chenhao Li, Xiang Lin, Tianji Zheng, Fanrong Meng, Xiaobin Rui, Zhixiao Wang

AI总结 本文综述了基于注意力机制的图神经网络(GNNs)的最新进展,系统梳理了其发展历程和典型架构,提出了一个包含三个发展阶段和多种结构类型的两级分类体系。文章详细回顾了各类方法,总结了它们的优缺点,并提供了模型特性对比表,同时探讨了当前面临的挑战与未来研究方向,为相关研究提供了全面的参考资源。

Comments This is the accepted manuscript of an article published in Artificial Intelligence Review. The final version is available online at: [10.1007/s10462-023-10577-2](https://link.springer.com/article/10.1007/s10462-023-10577-2)

详情
Journal ref
Artif Intell Rev 56 (Suppl 2), 2263 2310 (2023)
英文摘要

Graph neural networks (GNNs) aim to learn well-trained representations in a lower-dimension space for downstream tasks while preserving the topological structures. In recent years, attention mechanism, which is brilliant in the fields of natural language processing and computer vision, is introduced to GNNs to adaptively select the discriminative features and automatically filter the noisy information. To the best of our knowledge, due to the fast-paced advances in this domain, a systematic overview of attention-based GNNs is still missing. To fill this gap, this paper aims to provide a comprehensive survey on recent advances in attention-based GNNs. Firstly, we propose a novel two-level taxonomy for attention-based GNNs from the perspective of development history and architectural perspectives. Specifically, the upper level reveals the three developmental stages of attention-based GNNs, including graph recurrent attention networks, graph attention networks, and graph transformers. The lower level focuses on various typical architectures of each stage. Secondly, we review these attention-based methods following the proposed taxonomy in detail and summarize the advantages and disadvantages of various models. A model characteristics table is also provided for a more comprehensive comparison. Thirdly, we share our thoughts on some open issues and future directions of attention-based GNNs. We hope this survey will provide researchers with an up-to-date reference regarding applications of attention-based GNNs. In addition, to cope with the rapid development in this field, we intend to share the relevant latest papers as an open resource at https://github.com/sunxiaobei/awesome-attention-based-gnns.

2605.08645 2026-05-12 physics.plasm-ph cs.LG

Energy-based models for diagnostic reconstruction and analysis in a laboratory plasma device

Phil Travis, Troy Carter

AI总结 本文将基于能量的方法(EBMs)应用于实验室等离子体物理研究,用于诊断数据的重建与分析。研究通过构建能量表面,学习数据的联合概率分布,从而实现对复杂非线性等离子体现象的深入分析与条件采样。作者在大型等离子体装置(LAPD)上训练了一个结合卷积神经网络和注意力机制的EBM模型,展示了其在诊断重建、逆问题求解和异常检测等方面的实用价值,为等离子体物理研究提供了新的分析工具。

Comments 15 pages, 10 figures

详情
英文摘要

Energy-based models (EBMs) provide a powerful and flexible way of learning a joint probability distribution over data by constructing an energy surface. This energy surface enables insight extraction and conditional sampling. We apply EBMs to laboratory plasma physics, a domain characterized by highly nonlinear phenomena. These phenomena are studied using plasma diagnostics, which are often difficult to analyze and subject to hardware degradation. In addition, the possible configuration space of a plasma device is sufficiently large that it cannot be efficiently searched using conventional analysis techniques. EBMs address these issues. At the Large Plasma Device (LAPD), a CNN- and attention-based EBM is trained on a set of randomly generated machine conditions and their corresponding diagnostic time series. We demonstrate diagnostic reconstruction using this EBM on real data and show that additional diagnostics improves reconstruction error and generation quality. The energy surface is directly evaluated for an ill-posed inverse problem: inferring probe position from a time-series measurement. This inference illuminates symmetries in the data, potentially leading to a method of inquiry to supplement conventional data analysis. Trends in diagnostic signals are inferred via conditional sampling over machine inputs. In addition, this multimodal EBM is able to unconditionally reproduce all distributional modes, suggesting future potential in anomaly detection on the LAPD. Fundamentally, this work demonstrates the flexibility and efficacy of EBM-based generative modeling of laboratory plasma data, and showcases multiple practical uses of just a single trained EBM in the physical sciences.

2605.08633 2026-05-12 cs.DC cs.CV

Transforming the Use of Earth Observation Data: Exascale Training of a Generative Compression Model with Historical Priors for up to 10,000x Data Reduction

Jinxiao Zhang, Runmin Dong, Xiyong Wu, Xihan Huang, Shenggan Cheng, Yunkai Yang, Zheng Zhou, Yunpu Xu, Zhaoyang Luo, Miao Yang, Fan Wei, Mengxuan Chen, Yang You, Juepeng Zheng, Weijia Li, Yutong Lu, Haohuan Fu

AI总结 该研究提出了一种基于历史先验的生成式压缩框架,旨在将地球观测数据的压缩从传统的存储和传输工具转变为一种新型的数据使用方式,实现高达10,000倍的数据压缩比。通过在LineShine Armv9超算上进行超大规模训练,研究团队优化了模型设计、内核、内存层次、运行时和并行性,实现了每秒1.54至2.16 EFLOP的高效训练性能。该方法利用地球观测数据重复测量同一星球的特性,为极端压缩提供了可行方案,展示了历史先验生成压缩在数据获取、传输、存储和科学应用中的巨大潜力。

详情
英文摘要

Earth observation is becoming one of the largest data-producing activities in science, yet current pipelines still treat compression as a storage and transmission tool rather than a new way to use data. We present a generative compression framework that learns from historical Earth observation archives and enables on-demand 100x to 10,000x data reduction across downstream tasks. Unlike general visual data, Earth observation repeatedly measures the same evolving planet, making historical-prior learning feasible for extreme compression. To realize this paradigm, we train large generative compression models at exascale on the LineShine Armv9 CPU supercomputer, with co-optimization across model design, kernels, memory hierarchy, runtime, and parallelism. Our implementation sustains 1.54 EFLOP/s and peaks at 2.16 EFLOP/s in end-to-end training. This work shows that historical-prior generative compression can turn Earth observation data into an active, task-adaptive foundation for acquisition, delivery, storage, and scientific use.

2605.08626 2026-05-12 eess.SP cs.DC cs.LG cs.MA

Large Language Models over Networks: Collaborative Intelligence under Resource Constraints

Liangqi Yuan, Wenzhi Fang, Shiqiang Wang, H. Vincent Poor, Christopher G. Brinton

AI总结 本文研究了在资源受限环境下,如何通过网络中多个设备与云端的大型语言模型(LLM)协作,实现高质量的智能服务。核心方法是提出垂直的设备-云协同和水平的多智能体协作两种互补的协作推理方式,并探讨了协作训练中路由策略的学习与模型间协同能力的提升。主要贡献在于构建了适用于不同资源约束条件的协作智能框架,并指出了在异构资源扩展与可信协作方面的重要研究挑战。

详情
英文摘要

Large language models (LLMs) are transforming society, powering applications from smartphone assistants to autonomous driving. Yet cloud-based LLM services alone cannot serve a growing class of applications, including those operating under intermittent connectivity, sub-second latency budgets, data-residency constraints, or sustained high-volume inference. On-device deployment is in turn constrained by limited computation and memory. No single endpoint can deliver high-quality service across this spectrum. This article focuses on collaborative intelligence, a paradigm in which multiple independent LLMs distributed across device and cloud endpoints collaborate at the task level through natural language or structured messages. Such collaboration strives for superior response quality under heterogeneous resource constraints spanning computation, memory, communication, and cost across network tiers. We present collaborative inference along two complementary and composable dimensions: vertical device-cloud collaboration and horizontal multi-agent collaboration, which can be combined into hybrid topologies in practice. We then examine learning to collaborate, addressing the training of routing policies and the development of cooperative capabilities among LLMs. Finally, we identify open research challenges including scaling under resource heterogeneity and trustworthy collaborative intelligence.

2605.08594 2026-05-12 cs.AR cs.IT cs.LG math.IT

FLARE: One-Shot PE-Level Fault Localization in Systolic Arrays via Algebraic Test Vectors

Logashree Venkatasubramanian, Zishen Wan, Viveck Cadambe

AI总结 本文提出了一种名为FLARE的算法方法,用于在系综阵列中实现单次测试的PE级故障定位。该方法通过使用互质测试向量,使得每个PE的故障能够通过其产生的偏差唯一地被识别出来,从而在无需硬件冗余的情况下实现高效的故障定位。实验表明,该方法在INT16算术下能够以高概率定位高达256×256规模阵列中的故障,且测试开销低于一次推理GEMM瓦片的1%。

详情
英文摘要

Systolic arrays are the dominant compute fabric for neural network inference. Prior work has addressed column-level fault detection efficiently with uniform test patterns, but row-level (PE-level) fault localization within a faulty column remains open without resorting to hardware redundancy. The fundamental obstacle is that uniform test inputs destroy per-row signatures: any test that activates every row equally cannot distinguish which row is the source of an observed deviation. In this paper, we propose a lightweight, purely algorithmic remedy based on coprime test vectors. By assigning pairwise coprime integers as test-input entries, a permanent weight-register fault produces a deviation whose divisibility signature uniquely identifies the faulty row. Under a general bounded error model, a single test pass localizes the faulty row with high probability. This error model covers a broader class of faults than what prior dataflow-aware testing work has primarily emphasized. When one round is insufficient, a second pass using a ratio computation achieves exact localization; for the special case of single-bit errors, odd coprime entries guarantee exact localization in one round. For INT16 arithmetic, a single test pass covers array sizes up to $256{\times}256$ with localization probability above $0.98$, at a test cost under $1\%$ of one inference GEMM tile.

2605.08590 2026-05-12 cs.HC cs.AI cs.CL cs.CY

Causal Stories from Sensor Traces: Auditing Epistemic Overreach in LLM-Generated Personal Sensing Explanations

Shanshan Zhu, Han Zhang, J. Doris Chi, Subigya Nepal, Koustuv Saha

AI总结 该研究探讨了大型语言模型(LLM)在解释个人传感数据时可能出现的“知识越界”问题,即生成的解释超出了可用数据的支持范围。研究通过分析三类大学生活数据集中的异常日场景,使用三种主流LLM生成大量解释,并评估其因果归因、数据缺口、语言自信程度等方面的合理性。结果表明,LLM常在缺乏足够证据的情况下做出因果推断,且提供更多上下文并不能有效减少这一问题,强调了在生成个人传感解释时应重视数据依据的严谨性。

详情
英文摘要

LLMs are increasingly used to explain personal sensing data, translating traces of activity and mood into natural-language accounts of why an anomalous day may have occurred. However, such explanations can sound coherent and personally meaningful even when the underlying evidence is sparse or missing. We introduce epistemic overreach (EO) as a measure for cases where a generated explanation implies more than the available sensing evidence can justify. To audit how often and in what forms EO occurs, we obtained anomalous-day scenarios from three longitudinal sensing datasets of college students: StudentLife, GLOBEM, and CollegeExperience. Across activity, sleep, and affect anomalies, we generated 14,922 explanations using three LLM families -- Llama, Qwen, and GPT -- under two prompting conditions: one minimally constrained prompt and another prompt explicitly instructing models to bound claims to the data. For each scenario, we varied the amount of behavioral evidence available to the model to examine whether more evidence reduces EO. We evaluated each explanation using a structured rubric, decomposing EO into the dimensions of unsupported causal attribution, unacknowledged data gaps, overconfident language, temporal inconsistency, and diagnostic inference. We find that LLMs routinely attribute anomalous days to causes without sufficient support from the data, and that this pattern replicates across datasets, anomaly types, and model families. Further, providing richer context does not reliably reduce EO; bounded prompting helps but does not eliminate it. These findings suggest that evidential grounding should be a first-order evaluation criterion for LLM-generated personal sensing explanations, alongside fluency and plausibility. We argue that personal sensing explanations require evidential discipline: systems must distinguish what is observed, what is inferred, and what remains unknown.

2605.08580 2026-05-12 cs.MA cs.AI

Slipstream: Trajectory-Grounded Compaction Validation for Long-Horizon Agents

Zhuofu Chen, Rui Pan, Yinwei Dai, Ravi Netravali

AI总结 为了解决长时域大语言模型代理生成的大量上下文带来的问题,研究提出了一种异步压缩方法,通过在原始上下文上并行运行压缩器和代理执行,生成独立于压缩摘要的验证信号,从而填补结构验证缺口。该研究构建了Slipstream系统,利用一个判断器验证候选摘要是否保留了代理的前进意图和关键事实与约束,有效提升了任务准确率并降低了端到端延迟。

Comments 9 pages (16 pages counting references, appendix), 6 figures, 2 tables

详情
英文摘要

To cope with the large contexts that long-horizon LLM agents produce, modern frameworks increasingly rely on compaction -- invoking an LLM to rewrite the accumulated trajectory into a shorter summary that the agent resumes from. Today, compaction runs synchronously on the critical path of agent execution but this can unpredictably degrade accuracy due to a structural validation gap: the compactor must condense context but is fundamentally unaware of precisely what information the agent will need later. Further, because post-compaction agent steps are conditioned on the new summary, targeted validation criteria do not exist and errors silently propagate through coherent but incorrect behavior. Our key insight is that asynchronous compaction efficiently addresses this gap: by running the compactor in parallel with continued agent execution on the original context, the candidate summary and the agent's next steps are generated independently from the same pre-compaction state, yielding a validation signal independent of the summary itself. We build Slipstream, a trajectory-grounded compaction system that uses a judge to validate the candidate summary against the agent's continued reasoning, checking that it preserves both the agent's forward intent and the key facts and constraints it depends on. Across long-horizon coding (SWE-bench Verified) and web-browsing (BrowseComp) workloads, Slipstream improves task accuracy by up to 8.8 percentage points while reducing end-to-end latency by up to 39.7%.

2605.08561 2026-05-12 stat.ML cs.LG

CONTRA: Conformal Prediction Region via Normalizing Flow Transformation

Zhenhan Fang, Aixin Tan, Jian Huang

AI总结 本文提出了一种名为CONTRA的新方法,用于生成多维输出的可靠预测区域。该方法通过归一化流的潜在空间定义非一致性评分,从而克服传统方法在高维空间中预测区域模糊的问题。CONTRA不仅能够生成更精确的预测区域,还支持与现有预测模型结合使用,提升其预测可靠性,适用于多种数据集,具有广泛的适用性。

Comments 18 pages, 7 figures and 5 tables

详情
Journal ref
International Conference on Learning Representations 2025
英文摘要

Density estimation and reliable prediction regions for outputs are crucial in supervised and unsupervised learning. While conformal prediction effectively generates coverage-guaranteed regions, it struggles with multi-dimensional outputs due to reliance on one-dimensional nonconformity scores. To address this, we introduce CONTRA: CONformal prediction region via normalizing flow TRAnsformation. CONTRA utilizes the latent spaces of normalizing flows to define nonconformity scores based on distances from the center. This allows for the mapping of high-density regions in latent space to sharp prediction regions in the output space, surpassing traditional hyperrectangular or elliptical conformal regions. Further, for scenarios where other predictive models are favored over flow-based models, we extend CONTRA to enhance any such model with a reliable prediction region by training a simple normalizing flow on the residuals. We demonstrate that both CONTRA and its extension maintain guaranteed coverage probability and outperform existing methods in generating accurate prediction regions across various datasets. We conclude that CONTRA is an effective tool for (conditional) density estimation, addressing the under-explored challenge of delivering multi-dimensional prediction regions.

2605.08559 2026-05-12 math.FA cs.LG cs.NA cs.NE math.NA math.OC

Structure-Preserving Reconstruction of Convex Lipschitz Functionals on Hilbert Spaces from Finite Samples

Anastasis Kratsios

AI总结 该论文研究了如何从有限样本点重建定义在可分希尔伯特空间上的凸Lipschitz泛函的问题。作者提出了一种显式的有限样本重建方法,能够在保持凸性和Lipschitz性质的同时,达到任意给定的精度。该方法仅需有限个线性测量,并可通过ReLU神经网络实现,进而引入了凸神经泛函(CNF)这一结构化可训练模型,为从有限数据中学习凸泛函提供了理论基础。

详情
英文摘要

Convex functionals are ubiquitous in applied analysis, appearing as value functions, risk measures, super-hedging prices, and loss functionals in machine learning. In many applications, however, the functional is only observed through finitely many exact pointwise evaluations. We ask whether a convex functional on a separable Hilbert space $H$ can be reconstructed, up to arbitrary uniform accuracy, by an explicit formula which preserves convexity and Lipschitz regularity and is finitely computable. We answer this affirmatively. For every compact convex $C\subseteq H$, every $L$-Lipschitz convex functional $ρ:C\to\mathbb{R}$, and every $\varepsilon>0$, we construct an explicit finite-sample reconstruction which is convex, $L$-Lipschitz, and uniformly $\varepsilon$-accurate on $C$. The construction uses only finitely many linear measurements $\langle b,\cdot\rangle_H$, with $b$ lying in a finite-dimensional subspace of $H$, and is exactly implementable by a $\operatorname{ReLU}$-MLP. Building on this, we introduce convex neural functionals (CNFs), a structured trainable architecture class containing our reconstruction, whose every admissible parameter configuration is automatically convex and Lipschitz, providing a principled foundation for learning convex functionals from finite data.

2605.08553 2026-05-12 cs.SE cs.AI cs.LG

VeriContest: A Competitive-Programming Benchmark for Verifiable Code Generation

Zichen Xie, Mrigank Pawagi, Yuxin Liu, Aaditi Rai, Lize Shao, John Berberian, Sicong Che, Wenxi Wang

AI总结 VeriContest 是一个用于可验证代码生成的编程竞赛基准,包含来自 LeetCode 和 Codeforces 的 946 道题目,涵盖 Rust 语言与 Verus 验证工具。每个题目均配有自然语言描述、专家验证的形式化规范、经过评测的代码、形式化证明以及测试用例,支持对规范生成、代码生成、证明生成及端到端验证的独立评估。该基准通过三阶段流程构建,结合人工验证与半自动化扩展,并引入测试作为质量保障层,揭示了当前模型在可验证代码生成方面与普通代码生成之间存在显著差距,为未来研究提供了严格的评估平台。

详情
英文摘要

Large language models can generate useful code from natural language, but their outputs come without correctness guarantees. Verifiable code generation offers a path beyond testing by requiring models to produce not only executable code, but also formal specifications and machine-checkable proofs. Progress in this direction, however, is difficult to measure: existing benchmarks are often small, focus on only one part of the pipeline, lack ground-truth proofs or rigorous specification validation, or target verification settings far from mainstream software development. We present VeriContest, a benchmark of 946 competitive-programming problems from LeetCode and Codeforces for verifiable code generation in Rust with Verus. Each problem pairs a natural language description with expert-validated formal specifications, judge-accepted Rust code, Verus-checked proofs, and positive and negative test suites. VeriContest is constructed through a three-phase pipeline that scales from manually verified seed problems to semi-automated expansion with human-in-the-loop review. To further strengthen benchmark quality, we use testing as an additional quality-assurance layer for validating postcondition completeness. VeriContest supports isolated and compositional evaluation of specification generation, code generation, proof generation, and end-to-end verified program synthesis. Evaluating ten state-of-the-art models reveals a sharp gap between coding ability and verifiable code generation: the strongest model reaches 92.18% on natural-language-to-code generation, but only 48.31% on specification generation, 13.95% on proof generation, and 5.29% end-to-end. These results identify proof and specification generation as the central bottlenecks for models and establish VeriContest as a rigorous platform for measuring and training future systems that generate code with machine-checkable correctness.

2605.08552 2026-05-12 stat.ML cs.LG

Learnability and Competition in High-Dimensional Multi-Component ICA

Eser Ilke Genc, Samet Demir, Zafer Dogan

AI总结 本文研究了高维多分量独立成分分析(ICA)中的可学习性与竞争机制,提出了一个渐近精确的平均场理论,揭示了在线学习过程中估计方向与真实成分之间的耦合关系。研究发现,在高维极限下,估计值与真实成分的重叠矩阵满足一个闭合的常微分方程系统,并据此发现了由初始化驱动的两种相态:解耦态和竞争态。该理论给出了学习率、数据矩和初始化之间的显式可学习边界与竞争条件,并通过实验验证了理论预测的轨迹和相变行为。

Comments 56 pages, 9 figures

详情
英文摘要

Independent Component Analysis (ICA) is a foundational tool for unsupervised representation learning, yet its high-dimensional theory remains largely limited to single-component recovery. We develop an asymptotically exact mean-field theory for multi-component online ICA, capturing the coupling induced by simultaneous learning and orthogonalization. In the high-dimensional limit, the joint empirical distribution of learned estimates and ground-truth components converges to a deterministic process, yielding a closed ODE system for the overlap matrix between learned directions and true components. This characterization reveals a genuinely multi-component, initialization-driven phase structure: a decoupled regime, where estimates align with distinct components and evolve nearly independently, and a competition regime, where overlapping initializations induce orthogonality-driven conflicts, slow reorientation, and delayed convergence. Our steady-state analysis gives explicit learnability boundaries and competition conditions linking step size, data moments, and initialization. These conditions show that larger higher-order moments and competition shrink the stable learning-rate window, increase convergence times, and predict a staircase phenomenon in which the number of recoverable components changes discretely with the learning rate. Experiments on synthetic data and hyperspectral remote sensing data validate the predicted trajectories and phase behavior.

2605.08546 2026-05-12 stat.ML cs.LG math.OC

Sliced Inner Product Gromov-Wasserstein Distances

Xiaoyun Gong, Gabriel Rioux, Ziv Goldfeld

AI总结 本文研究了高维数据下内积成本的格罗莫夫-瓦瑟斯坦(IGW)距离的可扩展性问题,提出了一种具有自然旋转不变性质的切片IGW距离,解决了其在一维情况下缺乏闭式解的难题。该方法在理论分析和数值实验中得到了验证,并应用于文本数据的异构聚类和语言模型表示比较任务中。

Comments 49 pages, 8 figures

详情
英文摘要

The Gromov-Wasserstein (GW) problem provides a framework for aligning heterogeneous datasets by matching their intrinsic geometry, but its statistical and computational scaling remains an issue for high-dimensional problems. Slicing techniques offer an appealing route to scalability, but, unlike Wasserstein distances, GW problems do not generally admit closed-form solutions in one-dimension. We resolve this problem for the GW problem with inner product cost (IGW), propose a sliced IGW distance that enjoys a natural rotational invariance property, and comprehensively study its structural and computational properties. Numerical experiments validating our theory are presented, followed by applications to heterogeneous clustering of text data and language model representation comparison.

2605.08528 2026-05-12 cs.MA cs.RO

SceneFactory: GPU-Accelerated Multi-Agent Driving Simulation with Physics-Based Vehicle Dynamics

Yicheng Zhu, Yang Chen, Tao Li, Zilin Bian

AI总结 本文提出了一种名为 SceneFactory 的 GPU 向量化自动驾驶模拟平台,能够在保持物理真实性的前提下实现高效的多智能体仿真。该平台基于 NVIDIA Isaac Sim 和 Isaac Lab 构建,通过将世界和智能体表示为批量张量,实现了在 GPU 上的并行控制、观测、奖励计算和策略推理。实验表明,SceneFactory 在相同硬件条件下相比非向量化方案的仿真吞吐量提升了 127 倍,并在湿滑路面等复杂条件下展示了物理感知策略的有效性。

详情
英文摘要

Autonomous-driving simulators typically trade physical fidelity for scalable parallelism. Physics-based platforms such as CARLA and MetaDrive provide articulated vehicle dynamics and contact, but their non-vectorized interfaces make batched training difficult. GPU-batched systems such as Waymax and GPUDrive scale to hundreds of scenarios by replacing rigid-body physics with simplified kinematic models, omitting tire--road interaction, suspension, contact dynamics, and road-condition-dependent friction. We introduce SceneFactory, a GPU-vectorized platform for procedural scene construction, physics-based multi-agent simulation, and RL in autonomous-driving environments. Built on NVIDIA Isaac Sim + Isaac Lab, SceneFactory represents worlds and agents as batched tensors: control, observations, rewards, resets, and policy inference run as GPU tensor operations over the Isaac Lab tensor API. SceneFactory converts Waymo Open Motion Dataset road topologies into simulation-ready USD worlds, runs many worlds concurrently on one GPU, populates each with multiple articulated PhysX vehicles, and maps precipitation and road-surface type to PhysX material friction coefficients. With GPU vectorization, SceneFactory achieves up to 127$\times$ higher throughput than a non-vectorized PhysX baseline on the same GPU and physics solver, reaching 19,250 controlled-agent simulation steps per second at 256 worlds $\times$ 16 agents. Cross-simulator transfer reveals an asymmetric dynamics gap: physics-grounded RL policies transfer to a simplified kinematic bicycle model with 99.5% success, whereas reverse transfer drops to 47.3%. Under wet-road friction, friction-aware policies reduce mean peak DRAC from 58.7 to 27.8,m/s$^2$ without sacrificing goal reach. SceneFactory shows that scalable autonomous-driving training need not discard articulated rigid-body dynamics or physically grounded road-condition variation.

2605.08527 2026-05-12 cs.DC cs.AI

MARLaaS: Multi-Tenant Asynchronous Reinforcement Learning as a Service

Timothy Tin Long Yu, Gursimran Singh, Ge Shi, Hanieh Sadri, Yong Zhang, Zhenan Fan

AI总结 MARLaaS 是一个面向多用户的异步强化学习即服务系统,旨在降低大语言模型微调的计算成本并提升效率。该系统通过共享基础模型并采用轻量级 LoRA 适配器,结合分阶段异步架构,实现了多个任务的并发训练。其设计有效减少了任务间的干扰和空闲时间,显著提升了硬件利用率并缩短了端到端训练时间。

详情
英文摘要

Reinforcement Learning from Verifiable Rewards (RLVR) has significantly improved the reasoning capabilities of large language models (LLMs), particularly in multi-turn agentic settings involving environment interaction like tool use. However, fine-tuning such models remains prohibitively expensive due to high computational requirements, limiting accessibility. We propose MARLaaS (Multi-tenant Asynchronous RL as a Service), a system for concurrent RL fine-tuning across multiple users and tasks. Our approach is based on two key ideas: (1) sharing a base model across tenants using lightweight LoRA adapters, and (2) a disaggregated asynchronous architecture that decouples rollout generation, environment interaction, and policy training into independently scheduled stages. This design enables tasks to progress through the RL pipeline at their own pace in an event-driven manner, reducing cross-task interference, idle time, and end-to-end latency. In multi-task settings (we report up to 32 concurrent tasks), MARLaaS achieves single-task state-of-the-art performance while improving accelerator utilization by up to 4.3x and reducing end-to-end training time by 85%.

2605.08499 2026-05-12 cs.IR cs.AI

Multi-Level Graph Attention Network Contrastive Learning for Knowledge-Aware Recommendation

Zhifei Hu, Feng Xia

AI总结 本文针对知识图谱增强推荐系统中标签稀疏、图结构学习不足和知识实体噪声等问题,提出了一种多视角图对比学习框架。该方法通过多视角知识图谱蒸馏增强用户表示,结合邻居实体信息构建更具信息量的物品表示,并设计了多级自监督对比学习模块,从跨层、层内和交互三个层面进行对比学习,提升模型的泛化能力和区分能力。实验结果表明,该框架在多个公开数据集上优于现有先进方法,验证了其有效性。

详情
英文摘要

In recent years, the use of edge information provided by knowledge graphs together with the advantages of higher-order connectivity in graph neural networks for recommendation systems has become an important research direction. However, existing approaches are often limited by sparse labels, insufficient graph structure learning, and noisy entities in the knowledge graph, which reduce recommendation accuracy. To address these limitations, we propose a multi-view graph contrastive learning framework. The proposed method enhances user representations through multi-view knowledge graph distillation, enabling more accurate modeling of user preferences over entities and relations. The network aggregates neighborhood entity information to construct informative item representations. Furthermore, we design a multi-level self-supervised contrastive learning module that performs comparisons across three perspectives: Inter-Level, Intra-Level, and Interaction-Level. This design improves the model's ability to generalize across intra-class samples while increasing discrimination between inter-class samples, thereby enabling more effective multi-dimensional feature modeling. We conduct extensive experiments on three public datasets using both baseline and ablation settings. Experimental results demonstrate that the proposed framework consistently outperforms existing state-of-the-art methods. Ablation studies further verify the effectiveness of each module in the proposed model.

2605.08488 2026-05-12 math.OC cs.LG

A Unified Lyapunov-IQC Framework for Uniform Stability of Smooth Quadratic First-Order Accelerated Optimizers

Don Li, Dacian Daescu

AI总结 本文提出了一种统一的李雅普诺夫-积分二次约束(IQC)框架,用于分析光滑二次目标函数下一阶加速优化算法的均匀稳定性。该方法通过引入李雅普诺夫函数和IQC不等式,将优化算法的动力学建模为线性系统与梯度算子的反馈互联结构,从而将稳定性分析转化为一个可由半定规划求解的线性矩阵不等式可行性问题。该框架不仅适用于Nesterov加速梯度法,还为优化动力学与鲁棒控制理论之间的结构联系提供了新的视角,并为复杂优化算法的稳定性验证提供了模块化的方法。

详情
英文摘要

We develop a unified Lyapunov-integral quadratic constraint (IQC) framework for establishing uniform stability of first-order accelerated optimization algorithms in the $β$-smooth and $γ$-strongly convex regime. Classical analyses of uniform stability, such as the work of Hardt, Recht, and Singer for stochastic gradient descent (SGD), rely on direct coupling arguments and case-by-case control of iterate differences under random sampling. Extending such arguments to accelerated methods, such as Nesterov Accelerated Gradient (NAG), is complicated by the presence of higher-order state dynamics induced by momentum. We first extend this classical approach with the use of Lyapunov functions to provide a uniform stability bound for smooth quadratic NAG, and supplement this result with small-scale numerical experiments. We then extend this framework by modeling first-order accelerated optimizers as Lur'e-type feedback interconnections between a linear dynamical system and a (non-linear) gradient operator. $β$-Smoothness and $γ$-strong convexity are encoded a sector IQC inequality. Under this representation, uniform stability is certified via the existence of a quadratic Lyapunov function satisfying a finite-dimensional linear matrix inequality (LMI) in the form of a feasibility problem, which can be solved via semi-definite programming (SDP). We instantiate this framework for NAG and show how classical uniform stability bounds can be recovered via this framework. These results underscore a structural connection between optimization dynamics and robust control theory, providing a modular methodology for reliable and reproducible numerical certification of uniform stability and generalization behavior of first-order methods via convex optimization tools that is adaptable to increasingly complex optimization algorithms.

2605.08485 2026-05-12 stat.ML cs.LG math.ST stat.ME stat.TH

Sinkhorn Treatment Effects: A Causal Optimal Transport Measure

Medha Agarwal, Alex Luedtke

AI总结 本文提出了一种名为Sinkhorn处理效应的因果最优运输度量,用于衡量反事实分布之间的差异。该方法基于熵正则化的最优运输理论,能够捕捉整个分布层面的差异,而不仅仅是平均处理效应。通过将其表示为反事实均值嵌入的平滑变换,作者建立了该度量的路径可微性,并构造了去偏估计量,从而提出了用于检验分布处理效应的渐近有效检验方法。实验表明该方法在模拟和图像数据中具有良好的实际效果。

Comments 55 pages, 6 figures

详情
英文摘要

We introduce the Sinkhorn treatment effect, an entropic optimal transport measure of divergence between counterfactual distributions. Unlike classical quantities such as the average treatment effect, this measure captures differences across entire distributions. We analyze this divergence as a statistical functional and show it can be written as a smooth transformation of counterfactual mean embeddings with an appropriate kernel. This characterization allows us to establish first-order pathwise differentiability in general, and second-order pathwise differentiability under the null hypothesis of equal counterfactual distributions. Leveraging this smoothness, we construct debiased estimators and use them to obtain asymptotically valid tests for distributional treatment effects with a fixed entropic regularization parameter. Because the power of the test depends on this unknown parameter, we further propose an aggregated test that combines evidence across a grid of regularization choices. Experiments on simulated and image data demonstrate the practical advantages of our estimator and testing procedure.

2605.08460 2026-05-12 cs.CR cs.AI

When Child Inherits: Modeling and Exploiting Subagent Spawn in Multi-Agent Networks

Ziwen Cai, Yihe Zhang, Xiali Hei

AI总结 本文研究了多智能体网络中子代理(subagent)生成机制可能带来的安全风险,特别是当父代理被攻击后,其继承的内存可能将恶意指令、过时状态或非预期行为规则传递给新生成的子代理,从而导致攻击范围扩散。作者通过分析当前主流框架中的不安全内存继承、弱资源控制等问题,揭示了继承机制对多智能体系统安全性的关键影响,并提出了基于显式安全不变量的防御方法。

详情
英文摘要

Since the official release of ChatGPT in 2022, large language models (LLMs) have rapidly evolved from chatbot-style interfaces into agentic systems that can delegate work through tools and newly spawned subagents. While these capabilities improve automation and scalability, they also pose new security risks in multi-agent networks. Existing research has studied how individual LLM-based agents can be compromised through prompt injection, jailbreaking, poisoned retrieval data, or malicious extensions. Less is known about what happens after one agent is compromised inside a multi-agent network. In particular, inherited memory from parent agents can carry malicious instructions, outdated states, or unintended behavioral rules into newly created subagents, allowing a local compromise to spread across agent boundaries. In this paper, we model contemporary multi-agent networks through the lens of subagent inheritance. Our analysis shows that current frameworks can violate trust boundaries through insecure memory inheritance, weak resource control, stale post-spawn state, and improper termination authority. We demonstrate these risks in real agent frameworks and propose defenses based on explicit security invariants. Our findings show that inheritance is not merely an implementation detail, but a central component influencing the security of multi-agent systems.

2605.08456 2026-05-12 cs.CR cs.LG

HEART: A High-Efficiency Adaptive Real-Time Telemonitoring Framework for Secure Electrocardiogram Signal Transmission Using Chaotic Encryption

Beyazıt Bestami Yuksel

AI总结 本文提出了一种高效自适应的实时远程心电图(ECG)监测框架HEART,通过利用患者自身ECG信号特征生成可学习的加密密钥,实现对ECG信号的实时加密传输,保障数据隐私与诊断准确性。该方法采用混沌加密技术,结合动态密钥生成和生物特征刷新机制,有效提升了系统的安全性与抗攻击能力。实验结果表明,该框架在保证低加密延迟的同时,实现了高保真度的信号重建,具有良好的实时性能和诊断可靠性。

Comments 15 pages, 4 figure, 3 table

详情
Journal ref
ELECTRICA 2026
英文摘要

The realtime analysis and secure transmission of electrocardiogram ECG signals are critical for accurate diagnosis and safeguarding patient privacy in telemedicine applications This study presents a novel realtime ECG monitoring system that employs a learnable key generator LKG derived from each patients own ECG signal characteristics to dynamically produce unique encryption keys These keys determine the parameters r and x0 of a logistic map used for chaotic encryption The system securely encrypts realtime ECG data immediately after acquisition ensuring confidential transmission and storage in the cloud For remote clinical access the encrypted data is downloaded and decrypted on the doctors side using the matching key generated at the source or securely stored in the cloud This approach eliminates the need for traditional key exchange and substantially raises the cost of exhaustive key search in practice through persegment biometric key refresh and combined permutation and XOR diffusion supported by minentropy evaluation Compared to statickey methods the learnable biometric key design offers greater unpredictability and individualization A comprehensive set of security assessments including Shannon entropy 7678 bits correlation and autocorrelation disruption histogram statistics NIST SP 80022 frequency testing plaintextkey sensitivity avalanche effect FFTbased spectral flatness and robustness to noise and occlusion confirms the methods strength Reconstruction fidelity MSE approximately 5x106 PSNR greater than 52 dB MAE approximately 0002 demonstrates nearlossless decryption and preserved diagnostic features Encryption latency remains low preserving realtime performance.

2605.05682 2026-05-12 cs.HC cs.AI cs.CY

PersonaTeaming: Supporting Persona-Driven Red-Teaming for Generative AI

Wesley Hanwen Deng, Mingxi Yan, Sunnie S. Y. Kim, Akshita Jha, Lauren Wilcox, Kenneth Holstein, Motahhare Eslami, Leon A. Gatys

AI总结 该研究提出了一种基于角色驱动的红队测试方法(PersonaTeaming),旨在提升生成式AI的安全性评估,通过引入不同角色视角来丰富对抗性攻击策略。研究设计了PersonaTeaming Workflow,将角色信息融入对抗提示生成过程,相比现有方法在攻击成功率和提示多样性上表现更优。为进一步促进人机协作,研究还开发了PersonaTeaming Playground交互界面,支持红队人员自定义角色并与AI协作优化攻击提示,实验表明该方法有效激发了多样化的攻击策略并提升了红队人员的创造力。

详情
英文摘要

Recent developments in AI safety research have called for red-teaming methods that effectively surface potential risks posed by generative AI models, with growing emphasis on how red-teamers' backgrounds and perspectives shape their strategies and the risks they uncover. While automated red-teaming approaches promise to complement human red-teaming through larger-scale exploration, existing automated approaches do not account for human identities and rarely incorporate human inputs. In this work, we explore persona-driven red-teaming to advance both automated red-teaming and human-AI collaboration. We first develop PersonaTeaming Workflow, which incorporates personas into the adversarial prompt generation process to explore a wider spectrum of adversarial strategies. Compared to RainbowPlus, a state-of-the-art automated red-teaming method, PersonaTeaming Workflow achieves higher attack success rates while maintaining prompt diversity. However, since automated personas only approximate real human perspectives, we further instantiate PersonaTeaming Workflow as PersonaTeaming Playground, a user-facing interface that enables red-teamers to author their own personas and collaborate with AI to mutate and refine prompts. In a user study with 11 industry practitioners, we found that PersonaTeaming Playground enabled diverse red-teaming strategies and outputs that practitioners perceived as useful, and that AI-generated suggestions in the PersonaTeaming Playground encouraged out-of-the-box thinking even when practitioners did not follow them strictly. Together, our work advances both automated and human-in-the-loop approaches to red-teaming, while shedding light on interaction patterns and design insights for supporting human-AI collaboration in generative AI red-teaming.

2605.02416 2026-05-12 cs.IT cs.LG math.IT

Dueling DDQN-Based Adaptive Multi-Objective Handover Optimization for LEO Satellite Networks

Po-Heng Chou, Chiapin Wang, Chung-Chi Huang, Kuan-Hao Chen

AI总结 本文提出了一种基于双深度Q网络(DDQN)的多目标切换优化框架,用于低轨卫星网络,旨在动态平衡吞吐量、阻塞概率和切换成本。该方法通过引入竞争机制增强学习效果,能够适应时变网络环境。仿真结果表明,该方法在典型运行条件下相比传统方法具有更高的吞吐量和更低的阻塞率,性能提升显著。

Comments 6 pages, 5 figures, 1 table, and submitted to 2026 IEEE Globecom

详情
英文摘要

In this paper, we propose a dueling double deep Q-network (DDQN)-based adaptive multi-objective handover framework for low Earth orbit (LEO) satellite networks. The proposed method enables dynamic trade-off learning among throughput, blocking probability, and switching cost under time-varying network conditions. Simulation results demonstrate that the proposed approach consistently outperforms conventional baselines, achieving up to 10.3% throughput improvement and near-zero blocking under typical operating conditions.

2605.01708 2026-05-12 cs.DC cs.AI cs.LG

SplitZip: Ultra Fast Lossless KV Compression for Disaggregated LLM Serving

Yipin Guo, Siddharth Joshi

AI总结 在大规模语言模型(LLM)服务系统中,预填充(prefill)和解码(decode)阶段的分离导致了KV缓存传输成为性能瓶颈,尤其在长输入和智能体工作负载下更为明显。为了解决这一问题,本文提出SplitZip,一种专为GPU优化的无损KV缓存压缩方法,通过利用浮点数指数中的冗余信息,结合固定长度编码和稀疏逃逸流,实现了高效压缩与解压。实验表明,SplitZip在BF16和FP8格式下均显著提升了KV缓存传输效率,有效加速了端到端的模型服务过程。

详情
英文摘要

Contemporary systems serving large language models (LLMs) have adopted prefill-decode disaggregation to better load-balance between the compute-bound prefill phase and the memory-bound decode phase. Under this design, prefill workers generate a KV cache that must be transferred to decode workers before token generation can begin. With these workers residing on different physical systems, this transfer becomes a significant bottleneck to serving LLMs at scale. This bottleneck gets exacerbated for long-input and agentic workloads. Existing lossless codecs are not suited to this setting as they primarily target offline weight compression, run on the CPU, or use variable-length coding whose decompression is fast but compression is too slow to keep up with KV production during prefill. We introduce SplitZip, a GPU-friendly lossless compressor for KV cache transfer that preserves KV tensors bitwise and integrates into existing serving frameworks without changes to model execution. SplitZip exploits redundancy in floating-point exponents of KV activations, encoding the most frequent exponent values with fixed-length codes and routing rare exponents through a sparse escape stream of (position, value). An offline calibrated top-16 exponent codebook eliminates online-histogramming, while the regular dense path and sparse escape correction make both encoding and decoding efficient on GPUs. On real BF16 activation tensors, SplitZip achieves $613.3$ GB/s compression throughput and $2181.8$ GB/s decompression throughput, substantially outperforming prior lossless compressors on the latency-critical codec path. End-to-end transfer experiments show up to $1.32\times$ speedup for BF16 KV cache transfer, $1.30\times$ speedup for TTFT, and $1.23\times$ increase on Request Throughput. The same approach extends to FP8 KV caches, providing up to $1.14\times$ compression over native E5M2.

2605.01401 2026-05-12 cs.HC cs.AI

AI Expert Twin: Capturing Expert Cognition for Human-Centred, Practice-Based Learning

Annie Yuan, Xiaohua Chen, Kalina Yacef, Judy Kay

AI总结 本文提出了一种名为AI Expert Twin的认知为中心的框架,旨在捕捉专家实践中的隐性知识,包括程序性操作、语义概念和决策过程,并考虑价值偏好、权衡和不确定性对专家判断的影响。该框架通过三层结构化表示形式化专家认知,并在文化遗产工作坊的案例研究中验证了其可行性,展示了其在职业教育和创意产业等领域的可迁移性。该方法为构建透明、以学习者为中心的AI教育系统提供了新路径。

Comments 8 pages, 3 figures

详情
英文摘要

Tacit knowledge embedded in expert practice remains difficult to capture, formalise, and scale. While AI-driven educational systems have advanced personalisation, learner modelling, affective support, and self-regulated learning, they less often model the tacit reasoning and context-sensitive judgement that underpin expert practice in practice-based domains. This paper introduces the AI Expert Twin, a cognition-centric framework that models expert knowledge as structured, computable representations of procedural actions, semantic concepts, and decision processes. The framework also considers how value-laden preferences, trade-offs, and uncertainty shape expert judgement in practice. We formalise expert cognition as a three-layer representation and capture knowledge from experts under this model, laying the groundwork for integration into AI-powered educational system. A case study in a cultural heritage workshop demonstrates the feasibility of the approach in a real-world setting. The framework is designed to be transferable across domains such as vocational education and creative industries. By embedding expert heuristics into AI while maintaining transparency and learner agency, the AI Expert Twin offers a novel path towards scalable, practice-based learning and invites further research on ethical, human-centred applications of AI in education.