arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1942
2406.07125 2026-05-21 cs.CR cs.AI cs.LG

CARACAS: vehiCular ArchitectuRe for detAiled Can Attacks Simulation

CARACAS:用于详细CAN攻击模拟的车辆架构

Sadek Misto Kirdi, Nicola Scarano, Franco Oberti, Luca Mannella, Stefano Di Carlo, Alessandro Savino

AI总结 本文提出CARACAS,一种用于模拟详细CAN攻击的车辆模型,通过结合Simulink等仿真框架和攻击模型的稳健表示,生成合成数据集以提高IDS的检测能力,重点展示电池电动车的扭矩控制攻击模拟。

Comments 6 pages, 8 figures, TrustAICyberSec workshop - IEEE ISCC 2024

详情
Journal ref
Proceeding of the 29th IEEE Symposium on Computers and Communications, ISCC 2024
AI中文摘要

现代车辆越来越容易受到利用网络基础设施的攻击,特别是控制器局域网络(CAN)网络。为了使用基于数据分析和分类的现代工具如入侵检测系统(IDS)来有效应对这些威胁,需要大量的CAN消息大数据集。本文探讨了通过利用仿真框架如Simulink的建模能力以及攻击模型的稳健表示来生成合成数据集的可行性,提出了CARACAS车辆模型,包括通过CAN消息进行组件控制和攻击注入能力。CARACAS展示了该方法的有效性,包括电池电动车(BEV)模型,并重点针对两种不同的场景中的扭矩控制攻击进行分析。

英文摘要

Modern vehicles are increasingly vulnerable to attacks that exploit network infrastructures, particularly the Controller Area Network (CAN) networks. To effectively counter such threats using contemporary tools like Intrusion Detection Systems (IDSs) based on data analysis and classification, large datasets of CAN messages become imperative. This paper delves into the feasibility of generating synthetic datasets by harnessing the modeling capabilities of simulation frameworks such as Simulink coupled with a robust representation of attack models to present CARACAS, a vehicular model, including component control via CAN messages and attack injection capabilities. CARACAS showcases the efficacy of this methodology, including a Battery Electric Vehicle (BEV) model, and focuses on attacks targeting torque control in two distinct scenarios.

2605.20531 2026-05-21 cs.LO cs.LG

Pseudo-Formalization for Automatic Proof Verification

伪形式化用于自动证明验证

Slim Barkallah, Luke Bailey, Kaiyue Wen, Mohammed Abouzaid, Tengyu Ma

AI总结 本文提出了一种名为伪形式化的证明格式,该格式在保持自然语言灵活性的同时,保留了形式证明的模块性和精确性,通过块验证算法实现了对自然语言证明的高效验证,其在错误发现的精度和召回率上优于现有基线方法。

Comments 31 pages, code available at https://github.com/Slim205/pseudo-formalization

详情
AI中文摘要

可靠的证明验证仍然是训练和评估在复杂数学推理上的人工智能系统的主要瓶颈。在像Lean这样的语言中,完全形式化的证明容易验证,因为它们是无歧义且模块化的。大多数证明,尤其是由人工智能系统编写证明,既没有这种属性,将它们翻译成形式语言在许多前沿数学领域仍然具有挑战性。我们提出了伪形式化(PF),一种证明格式,它捕捉了形式证明的模块性和精确性,同时保留了自然语言的灵活性。一个伪形式化证明被分解成自包含的模块,每个模块陈述其前提、结论和证明,用自然语言。为了验证一个常规的自然语言证明的正确性,一个LLM将其翻译成伪形式化,然后独立验证每个模块,我们称之为块验证(BV)。我们在两个涵盖竞赛和研究级数学的基准上评估PF+BV,其中它在错误发现的精度和召回率上优于LLM-as-judge基线。为了支持未来的工作,我们发布了我们的研究级证明验证基准ArxivMathGradingBench。

英文摘要

Reliable verification of proofs remains a bottleneck for training and evaluating AI systems on hard mathematical reasoning. Fully formal proofs, in languages like Lean, are easy to verify because they are unambiguous and modular. Most proofs, particularly those written by AI systems, have neither property, and translating them into formal languages remains challenging in many frontier math settings. We propose Pseudo-Formalization (PF), a proof format that captures the modularity and precision of formal proofs while retaining the flexibility of natural language. A Pseudo-Formal proof is decomposed into self-contained modules, each stating its premises, conclusion, and proof in natural language. To verify the correctness of a regular natural language proof, an LLM translates it to Pseudo-Formal and then verifies each module independently, an algorithm we call Block Verification (BV). We evaluate PF+BV on two benchmarks spanning olympiad and research-level mathematics, where it pareto-dominates LLM-as-judge baselines on error-finding precision and recall. To support future work, we release our research-level proof verification benchmark ArxivMathGradingBench.

2605.20496 2026-05-21 q-bio.NC cs.CV

Platonic Representations in the Human Brain: Unsupervised Recovery of Universal Geometry

人类大脑中的柏拉图表示:无监督恢复通用几何

Pablo Marcos-Manchón, Rishi Jha, Lluís Fuentemilla

AI总结 该研究探讨了人类大脑是否能无监督地恢复通用几何结构,通过自监督编码器在fMRI数据中学习个体特定的嵌入表示,并证明这些表示可以通过几何变换在不同个体间转换。

Comments Code available at https://github.com/memory-formation/platonic-representations-fmri

详情
AI中文摘要

强柏拉图表示假说提出,人工神经网络中的表征收敛可以被积极利用:嵌入可以通过一个通用潜在空间在不同模型间转换,而无需配对数据。我们探讨是否可以在人类大脑中恢复类似的几何结构。使用自然场景数据集的fMRI数据,我们提出了一种自监督编码器,通过利用重复的刺激呈现,仅依靠脑数据学习个体特定的嵌入表示。我们证明这些独立学习的空间可以通过无监督的正交旋转在不同个体间转换,而无需配对的跨个体样本或中间模型表示。将成对旋转同步到一个共享的潜在空间进一步提高了跨个体检索效果,表明个体特定的空间与一个共同的坐标系统相互兼容。这些结果为人类视觉皮层中的共享神经几何提供了证据:个体特定的fMRI表示在不同个体间近似等距,并可通过纯粹的几何变换进行转换。

英文摘要

The Strong Platonic Representation Hypothesis suggests that representational convergence in artificial neural networks can be harnessed constructively: embeddings can be translated across models through a universal latent space without paired data. We ask whether an analogous geometry can be recovered across human brains. Using fMRI data from the Natural Scenes Dataset, we propose a self-supervised encoder that learns subject-specific embeddings from brain data alone by exploiting repeated stimulus presentations. We show that these independently learned spaces can be translated across subjects using unsupervised orthogonal rotations, without paired cross-subject samples or intermediate model representations. Synchronizing pairwise rotations into a single shared latent space further improves cross-subject retrieval, indicating that subject-specific spaces are mutually compatible with a common coordinate system. These results provide evidence for a shared neural geometry in the human visual cortex: subject-specific fMRI representations are approximately isometric across individuals and can be translated through purely geometric transformations.

2605.20473 2026-05-21 cs.SE cs.AI cs.LG

Code Generation by Differential Test Time Scaling

通过微分测试时间缩放进行代码生成

Yifeng He, Ethan Wang, Jicheng Wang, Xuanxin Ouyang, Hao Chen

AI总结 本文提出DiffCodeGen,一种基于覆盖引导的微分分析的代码生成方法,通过生成多样化的代码候选并利用覆盖引导模糊测试来合成输入,无需现有测试用例或大语言模型,从而提高效率和可扩展性。

Comments 16 main text, 21 pages with references

详情
AI中文摘要

测试时间缩放已崭露头角,成为通过在推理时间探索大规模解决方案空间来改进代码生成的有前途的方法。然而,现有方法通常依赖于公开的测试用例,这些在实践中不可用,或需要大量的LLM推理来选择候选,导致显著的token消耗和时间开销。我们提出了DiffCodeGen,一种基于覆盖引导的微分分析的新型测试时间缩放方法用于代码生成。DiffCodeGen利用各种采样和提示策略生成多样化的代码候选,然后应用覆盖引导的模糊测试来合成输入,而无需任何现有的测试用例或大语言模型。通过在这些输入上执行所有候选,DiffCodeGen捕捉到它们的动态行为并根据行为相似性对候选进行聚类。DiffCodeGen选择最大聚类的medoid作为最终输出。不同于先前的测试时间缩放方法需要额外的LLM推理来选择候选,DiffCodeGen在不调用任何额外模型的情况下进行选择,导致极小或没有额外的token消耗。DiffCodeGen完全异步,自然适合当前代理编程的趋势,因此是高效且高度可扩展的。我们评估了DiffCodeGen在4个大型语言模型上的表现,展示了相对于基线的一致改进。与最先进的测试时间缩放方法相比,DiffCodeGen在仅使用少量时间和token的情况下实现了竞争或更优的性能。DiffCodeGen是模型无关的,可以与推理模型结合以进一步提升性能。

英文摘要

Test-time scaling has emerged as a promising approach for improving code generation by exploring large solution spaces at inference time. However, existing methods often rely on public test cases that are unavailable in practice, or require extensive LLM inference for candidate selection, leading to significant token consumption and time overhead. We present DiffCodeGen, a novel test-time scaling method for code generation based on coverage-guided differential analysis. DiffCodeGen generates diverse code candidates using various sampling and prompting strategies, then applies coverage-guided fuzzing to synthesize inputs without requiring any existing tests or large language models. By executing all candidates on these inputs, DiffCodeGen captures their dynamic behavior and clusters candidates based on behavioral similarity. DiffCodeGen selects the medoid of the largest cluster as the final output. Unlike prior test-time scaling methods that invoke additional LLM inference for candidate selection, DiffCodeGen performs selection without any extra model calls, incurring little to no additional token consumption. DiffCodeGen is fully asynchronous, naturally suited to the current trend of agentic coding, and is thus efficient and highly scalable. We evaluate DiffCodeGen across 4 large language models, demonstrating consistent improvements over baselines. Compared to state-of-the-art test-time scaling methods, DiffCodeGen achieves competitive or superior performance while using only a fraction of time and tokens. DiffCodeGen is model-agnostic and can be combined with reasoning models to further boost performance.

2605.20456 2026-05-21 cs.SE cs.AI cs.MA

Agentic Agile-V: From Vibe Coding to Verified Engineering in Software and Hardware Development

Agentic Agile-V: 从Vibe编码到验证工程在软件和硬件开发中

Christopher Koch

AI总结 本文研究了代理AI在软件和硬件开发中的应用,提出Agentic Agile-V框架,通过SCOPE-V循环将对话意图转化为结构化工程成果,并贡献了最小输入艺术品类税、对话到合同门、风险自适应工作流和证据包接受模型。

Comments 7 pages, 1 figure

详情
AI中文摘要

Agentic AI coding systems can inspect repositories, plan implementation steps, edit files, call tools, run tests, and submit pull requests. These capabilities make software and hardware development faster in some settings, but current evidence does not support the simple claim that autonomous code generation automatically improves engineering outcomes. Controlled studies report productivity gains in some enterprise tasks, slowdowns in mature open-source work, moderate but heterogeneous meta-analytic effects, and persistent failures in repository setup, dependency handling, permission gating, and hardware verification. This paper argues that the central problem is no longer prompt engineering; it is engineering process control. It synthesizes evidence from agentic software engineering, GitHub-scale adoption studies, repository-level agent configuration, productivity trials, issue-resolution benchmarks, and hardware/RTL verification research. It proposes Agentic Agile-V, a process framework that uses Agile-V as the lifecycle backbone and a task-level SCOPE-V loop - Specify, Constrain, Orchestrate, Prove, Evolve, and Verify - to convert conversational intent into structured engineering artifacts and acceptance evidence. The paper contributes: (i) a taxonomy of minimum input artifacts for agentic software, firmware, and hardware work; (ii) a conversation-to-contract gate that separates exploratory dialogue from implementation; (iii) risk-adaptive feature, bug-fix, testing, and hardware workflows; and (iv) an evidence-bundle acceptance model for agent-generated artifacts. The paper concludes that agentic AI does not eliminate engineering discipline; it increases the value of requirements, constraints, traceability, independent verification, and human approval.

英文摘要

Agentic AI coding systems can inspect repositories, plan implementation steps, edit files, call tools, run tests, and submit pull requests. These capabilities make software and hardware development faster in some settings, but current evidence does not support the simple claim that autonomous code generation automatically improves engineering outcomes. Controlled studies report productivity gains in some enterprise tasks, slowdowns in mature open-source work, moderate but heterogeneous meta-analytic effects, and persistent failures in repository setup, dependency handling, permission gating, and hardware verification. This paper argues that the central problem is no longer prompt engineering; it is engineering process control. It synthesizes evidence from agentic software engineering, GitHub-scale adoption studies, repository-level agent configuration, productivity trials, issue-resolution benchmarks, and hardware/RTL verification research. It proposes Agentic Agile-V, a process framework that uses Agile-V as the lifecycle backbone and a task-level SCOPE-V loop - Specify, Constrain, Orchestrate, Prove, Evolve, and Verify - to convert conversational intent into structured engineering artifacts and acceptance evidence. The paper contributes: (i) a taxonomy of minimum input artifacts for agentic software, firmware, and hardware work; (ii) a conversation-to-contract gate that separates exploratory dialogue from implementation; (iii) risk-adaptive feature, bug-fix, testing, and hardware workflows; and (iv) an evidence-bundle acceptance model for agent-generated artifacts. The paper concludes that agentic AI does not eliminate engineering discipline; it increases the value of requirements, constraints, traceability, independent verification, and human approval.

2605.20442 2026-05-21 cs.HC cs.AI

Modeling Emotional Dynamics in Agent-to-Agent Interactions on Moltbook

在Moltbook上代理间交互中情感动态建模

Syed Mhamudul Hasan, Abdur R. Shahid

AI总结 本文研究了Moltbook中代理间交互的情感动态,提出了一种情感感知框架,用于将文本交互映射到预定义的细粒度情感类别,提取结构化的情感档案,并引入了基于情感的Persona-Stimulus-Reaction(PSR)领域来评估行为可靠性。

详情
AI中文摘要

生成式AI系统越来越多地被用作在线环境中的交互代理,例如名为Moltbook的社会网络。在Moltbook中,大规模的代理AI可以发布、评论并参与由AI驱动的文本生成的活动。然而,这些代理行为特征仍不够理解,特别是在复杂的多代理交互中。在本研究中,我们分析了Moltbook中代理交互的情感动态。我们构建了一个情感感知框架,将文本交互映射到预定义的细粒度情感类别集,从而在代理和交互上下文中提取结构化的情感档案。为进一步评估行为可靠性,我们引入了一个名为Persona-Stimulus-Reaction(PSR)的情感领域,以捕捉在相似上下文中情感响应的一致性。我们的分析显示,代理在不同上下文中表现出不同的情感模式和行为稳定性水平。我们的分析揭示了代理在不同上下文中表现出不同的情感特征,其行为稳定性因交互上下文而异。

英文摘要

Generative AI systems are increasingly deployed as interactive agents in online environments, such as a social network called Moltbook. In Moltbook, large-scale agentic AIs can post, comment, and engage in activities generated at scale by AI-driven text. Yet these agent behavioral characteristics remain insufficiently understood, particularly in complex, multi-agent interaction. In this study, we analyze the emotional dynamics of agent interactions within Moltbook. We construct an emotion-aware framework that maps textual interactions to a predefined set of fine-grained emotional categories, enabling the extraction of structured emotion profiles across agents and interaction contexts. To further evaluate behavioral reliability, we introduce an emotion-based domain called Persona-Stimulus-Reaction (PSR) that captures the alignment of emotional responses across similar contexts. Our analysis shows distinct emotional patterns and varying levels of behavioral stability across agents. Our analysis reveals that agents exhibit distinct emotional signatures with varying levels of behavioral stability influenced by interaction context.

2605.20435 2026-05-21 cs.SI cs.CL

Hiding in Plain Sight: Finding MAHA on Reddit

明目张胆:在推特上寻找MAHA

Sabit Ahmed, Subigya Nepal, Henry Kautz

AI总结 本文提出一个覆盖六年(2020-2025)的Reddit数据集,包含1940万条帖子和400万用户数据,用于研究MAHA运动的结构、话语和传播动态,以及其支持者的语言和行为模式。

Comments Submitted to ASONAM 2026

详情
AI中文摘要

Make America Healthy Again (MAHA) 是一项全国性健康运动,涵盖了从广受认可的饮食和运动担忧到对有机和转基因食品、儿童疫苗、科学和机构的争议性观点的混合信念。各种在社交媒体上推广MAHA运动的影响力者和推广者分散在整个在线空间中。研究MAHA信念的结构、话语和传播需要大规模的细粒度数字足迹。从大量无结构的社交媒体数据中构建涵盖不同MAHA主题的结构化数据具有挑战性。我们介绍了一个Reddit数据集,涵盖六年(2020-2025),包含1940万条帖子和400万用户数据。该数据集包含12种与MAHA一致的信念的自然和主题上下文,为不同领域的研究人员提供了研究MAHA运动动态、其结构和功能组件以及其支持者的语言和行为模式的机会。

英文摘要

Make America Healthy Again (MAHA) is a national health movement that encompasses a striking mix of beliefs, from broadly accepted concerns about good diet and exercise to controversial takes on organic and genetically modified food, childhood vaccination, science, and institutions. Various influencers and promoters of the MAHA movement on social media are scattered throughout the online space. Investigating the structure, discourse, and contagion of MAHA beliefs requires large-scale fine-grained digital footprints. Constructing structured data covering different MAHA themes from vast unstructured social media data is challenging. We introduce a Reddit dataset that spans six years (2020-2025), comprising 19.4M posts from 4M users. Containing the natural and thematic context of 12 MAHA-aligned beliefs, this dataset offers researchers from various domains the opportunity to study the dynamics of the MAHA movement, its structural and functional components, and the linguistic and behavioral patterns of its proponents.

2605.20434 2026-05-21 stat.ML cs.DM cs.LG

Contradiction Graphs Determine VC Dimension

矛盾图确定VC维

Jesse Campbell, Daniel Ibaibarriaga, Lev Reyzin

AI总结 本文研究二元概念类的矛盾图,通过分析矛盾图的结构确定VC维的阈值,从而精确计算VC维并区分有限与无限VC维。

详情
AI中文摘要

我们研究与二元概念类相关的矛盾图。对于一个概念类$H \subseteq \{0,1\}^X$,顺序-$m$矛盾图$G_m(H)$的顶点是长度为$m$的可由$H$实现的标记序列,当两个序列对某个公共域点赋予相反标签时,两个顶点相邻。我们的主要结果是单个图$G_m(H)$确定阈值谓词$\mathrm{VCdim}(H)\ge m$。因此,完整的序列$(G_m(H))_{m \ge 1}$确定精确的VC维,并且特别地,区分有限与无限VC维,回答了Alon等人(2024)提出的问题。

英文摘要

We study the contradiction graphs associated with binary concept classes. For a class $H \subseteq \{0,1\}^X$, the order-$m$ contradiction graph $G_m(H)$ has as vertices the $H$-realizable labeled sequences of length $m$, with two vertices adjacent when the two sequences assign opposite labels to some common domain point. Our main result is that the single graph $G_m(H)$ determines the threshold predicate $\mathrm{VCdim}(H)\ge m$. Consequently, the full sequence $(G_m(H))_{m \ge 1}$ determines the exact VC dimension and, in particular, detects finite versus infinite VC dimension, answering a question posed by Alon et al. (2024).

2605.20431 2026-05-21 cs.HC cs.RO

Multi-Week, In-Class Deployments of Telepresence Robots With Four Homebound K-12 Students: Benefits, Challenges, and Recommendations

多周课堂部署的四名居家K-12学生使用的远程存在机器人:益处、挑战与建议

Matthew Rueben, Rhianna Lee, Thomas R. Groechel, Hengzhi Chen, Haemi Lee, Gisele Ragusa, Maja J. Matarić

AI总结 本研究探讨了远程存在机器人在K-12教育中帮助居家学生参与课堂的益处、挑战及改进建议,通过四次多周部署和15次访谈分析了学生体验和课堂管理需求。

详情
Journal ref
Rueben, M., Lee, R., Groechel, T.R. et al. Multi-week, in-class deployments of telepresence robots with four homebound K-12 students: Benefits, challenges, and recommendations. Educ Inf Technol 31, 2145-2175 (2026)
AI中文摘要

在K-12教育中,缺席大量学校时间已被证明会增加学生认知和社会发展风险。替代方案如家庭教学和在线学习虽然常见,但缺乏与同龄人和教师的课堂互动。移动远程存在系统,或称为远程存在机器人,对居家学生有吸引力,因为它们提供了实时参与视频会议技术之外的具身性和移动性。然而,仍需研究以使远程存在机器人能够满足居家学生在K-12课堂环境中的复杂需求。我们通过四次多周部署,记录了四名居家K-12学生通过远程存在机器人参加课堂的体验,共进行了15次访谈并进行了定性案例研究分析。这些居家学生及其部署情境在多个维度上各不相同,尽管所有参与者都享受了移动远程参与的一些益处,但每个参与者也经历了独特的益处。一些关于听觉、视觉和移动机器人的挑战需要改进远程存在系统的设计。其他挑战则提出了课堂部署管理的优先事项,例如确保远程学生参与课堂活动、对教师负责,并受到同学的尊重。基于研究的见解,我们提出了类似情境中的现实部署程序的建议。

英文摘要

Missing significant amounts of school during K-12 education is known to put students' cognitive and social development at risk. Alternatives such as home instruction and online learning are common, but lack sufficient interaction with peers and teachers in the classroom. Mobile remote presence systems, or telepresence robots, are promising for homebound students because they provide embodiment and mobility in addition to the real-time participation offered by video conferencing technologies. Research is needed, however, for telepresence robots to meet the complex needs of homebound students participating remotely in the K-12 classroom context. We present findings from four multi-week deployments with homebound K-12 students attending classes via telepresence robots. The homebound students' experiences were documented in a total of 15 interviews and analyzed qualitatively as case studies. The homebound student participants and their deployment contexts differed from one another along multiple dimensions, and while some benefits of mobile remote attendance were enjoyed by all participants, each participant also experienced unique benefits. Some challenges with hearing, seeing, and moving the robot around the classroom warranted improvements to the design of the telepresence system. Other challenges suggested priorities for managing a classroom deployment, such as ensuring that the remote student is included in classroom activities, accountable to the teacher, and treated with respect by classmates. Based on insights from the study, we make recommendations for real-world deployment procedures in similar contexts.

2605.20405 2026-05-21 eess.IV cs.AI cs.CV physics.med-ph

Disentangling Sampling from Training Budget in Class-Imbalanced CT Body Composition Segmentation

在CT身体成分分割中解耦采样与训练预算

Iason Skylitsis, Dimitrios Karkalousos, Ivana Išgum

AI总结 本文提出了一种基于少样本学习的episodic采样方法,用于解决医学图像分割中的类别不平衡问题,通过解耦采样与训练预算,提高了小数据集下的分割性能。

详情
AI中文摘要

类别不平衡是医学图像分割中的基本挑战,其中频繁类通常在训练中占主导地位,而稀有类被忽视。基于损失的方法通过在批次内重新加权每个像素的损失来缓解不平衡,而采样策略控制哪些图像进入批次。然而,两者均未明确控制批次中出现的类别,导致稀有类的暴露仅部分平衡。在本文中,我们采用少样本学习中的episodic采样,以在完全监督设置中促进类别平衡的批次构造。我们解耦episodic采样与其传统的度量学习上下文,并在CT身体成分分割中评估其效果。我们在九种肌肉和脂肪组织上,从公共SAROS数据集中提取了210次扫描,将episodic采样与随机和加权采样进行比较。训练是在全数据和低数据模式下进行的,此外在匹配训练迭代预算下也进行了额外比较。在全数据训练中,三种策略表现相当(episodic的平均Dice为0.882,随机和加权为0.878)。在低数据训练中,episodic采样优于随机和加权(0.787 vs. 0.758和0.762),这由训练迭代数的12倍差异驱动。在匹配训练预算下,随机和加权过早过拟合,而episodic在达到平台前提高了约三倍的迭代次数。我们的发现识别了训练迭代预算作为采样策略中被低估的混淆因素,推动了小数据集的迭代感知评估协议。此外,episodic采样的残余优势与隐含的类别平衡批次的正则化效应一致,提供了一种低成本、模型无关的解决医学图像分割类别不平衡问题的策略。代码可在https://github.com/iasonsky/episodic-sampling上获得。

英文摘要

Class imbalance is a fundamental challenge in medical image segmentation, where frequent classes typically dominate training at the expense of rare classes. Loss-based approaches mitigate imbalance by reweighting the per-pixel loss within the batch, while sampling strategies control which images enter the batch. Yet neither explicitly controls which classes appear within the batch, leaving rare-class exposure only partially rebalanced. In this work, we adopt episodic sampling from few-shot learning to promote class-balanced batch construction in a fully supervised setting. We decouple episodic sampling from its conventional metric-learning context and evaluate it in body composition segmentation in CT. We compare episodic sampling against random and weighted sampling on nine muscle and adipose tissues, derived from 210 scans of the public SAROS dataset. Training is performed under full- and low-data regimes, with additional comparisons under matched training iteration budgets. Under full-data training, all three strategies performed comparably (mean Dice 0.882 for episodic, 0.878 for random and weighted). Under low-data training, episodic sampling outperformed random and weighted (0.787 vs. 0.758 and 0.762), driven by a 12-fold difference in training iterations. Under matched training budgets, random and weighted overfit earlier, while episodic improved for approximately three times more iterations before plateauing. Our findings identify the training iteration budget as under-recognized confound in sampling strategies, motivating iteration-aware evaluation protocols for small datasets. Furthermore, the residual advantage of episodic sampling is consistent with an implicit regularization effect of class-balanced batches, offering a low-cost, model-agnostic strategy for class-imbalanced medical image segmentation. Code is available at https://github.com/iasonsky/episodic-sampling.

2605.20400 2026-05-21 stat.AP cs.LG stat.ML

Understanding Deterioration Random Effects for Causal Discovery in Infrastructure Management

理解基础设施管理中的劣化随机效应以进行因果发现

Takato Yasuno

AI总结 本文提出了一种结合贝叶斯分层危险模型与因果发现的新框架,用于识别驱动泵设备异质劣化率的操作模式,通过GPU加速NUTS估计随机效应并验证线性假设,揭示不同操作制度需要不同的管理策略。

Comments 20 pages, 7 figures, 4 tables

详情
AI中文摘要

基础设施劣化对资产管理工作构成重大挑战,但现有方法依赖于人口平均模型,忽略了设备特定的异质性。我们提出了一种新的框架,结合贝叶斯分层危险建模与因果发现,以识别驱动泵设备异质劣化率的操作模式。我们的方法首先利用GPU加速的No-U-Turn Sampling (NUTS) 估计泵特定的随机效应 $u_i$,实现比CPU实现快3-5倍的速度提升。然后,我们使用DirectLiNGAM发现22个工程时间序列特征与劣化率之间的因果关系,并根据正 ($u_i > 0$, 更快劣化) 与负 ($u_i \leq 0$, 更慢劣化) 随机效应进行分层。分析112台泵共92,861个观测值,持续650天,我们发现显著的异质性:负组的因果效应比正组大400倍,标准差 (std) 显示在低风险设备上,正因果效应 ($+1.515$) 对劣化率有显著影响。我们通过NonlinearLiNGAM比较验证线性假设,并通过GPU加速展示实际可扩展性。我们的发现使通过揭示不同操作制度需要根本不同的管理方法,推动预测性维护从人口平均到异质性感知决策的进展。

英文摘要

Infrastructure deterioration poses significant challenges for asset management, yet existing approaches rely on population-averaged models that overlook equipment-specific heterogeneity. We present a novel framework that combines Bayesian hierarchical hazard modeling with causal discovery to identify operational patterns that drive heterogeneous deterioration rates in pump equipment. Our approach first estimates pump-specific random effects $u_i$ using GPU-accelerated No-U-Turn Sampling (NUTS), achieving 3--5$\times$ speedup over CPU implementations. We then employ DirectLiNGAM to discover causal relationships between 22 engineered time-series features and deterioration rates, stratified by positive ($u_i > 0$, faster deterioration) versus negative ($u_i \leq 0$, slower deterioration) random effects. Analyzing 112 pumps with 92,861 observations over 650 days, we uncover striking heterogeneity: the negative group exhibits causal effects 400$\times$ larger than the positive group, with standard deviation (std) showing a strong positive causal effect ($+1.515$) on deterioration rates in low-risk equipment. We validate linearity assumptions through NonlinearLiNGAM comparison and demonstrate practical scalability through GPU acceleration. Our findings enable targeted maintenance strategies by revealing that different operational regimes require fundamentally distinct management approaches, advancing predictive maintenance from population-averaged to heterogeneity-aware decision making.

2605.20391 2026-05-21 cs.CR cs.LG

Latent Geometry as a Structural Monitor: Eigenspace Alignment for Anomaly Detection in Anonymity Networks

潜在几何作为结构监视器:用于匿名网络异常检测的特征空间对齐

Vaibhav Chhabra

AI总结 本文提出利用潜在几何结构来监测匿名网络中的异常,通过特征空间对齐方法检测行为群体中的异常模式,展示了在Tor网络中通过双观察者流程识别稳定九维负载子空间的方法,并验证了其结构稳定性。

Comments 14 pages, 5 figures, 1 table

详情
AI中文摘要

传统异常检测在测量信号超过预设阈值时标记事件,这捕捉到了转变的时刻,但未能捕捉到其前的结构性压力。我们提出将大规模行为群体视为几何能量景观,其变形可以在主要转变前和期间测量。核心论点是结构优先于几何:行为群体的结构组织是信号,而几何度量是测量它的工具。应用于Tor匿名网络连续67天的观测窗口,双观察者流程识别出一个在观测期间保持不变的九维负载子空间,并通过蒙特卡洛模拟在噪声底面以上16.8西格玛验证了该结构。主要检测门在24个确认稳定的窗口中实现了0.0%的误报率。对2026年2月20日确认的基础设施事件的调查正式否定了中继退出假说,识别出连接降级而无拓扑变化为可检测的网络故障模式。结果是一种候选的结构监视框架,适用于具有足够遥测数据的行为群体。

英文摘要

Traditional anomaly detection marks events when measured signals cross predefined thresholds. This captures the moment of transition but not the structural pressure that precedes it. We propose treating large behavioral populations as geometric energy landscapes whose deformation can be measured before and during major transitions. The central thesis is that structure precedes geometry: the structural organization of the population is the signal, and geometric metrics are instruments for measuring it. Applied to the Tor anonymity network across 67 consecutive daily observation windows, the dual-observer pipeline identifies a stable nine-dimensional load-bearing subspace invariant across the observation period and validates this structure by Monte Carlo simulation at 16.8 sigma above the noise floor. Primary detection gates achieve 0.0% false positive rate on 24 confirmed stable windows. Forensic analysis of the February 20, 2026 confirmed infrastructure event formally falsifies the relay-departure hypothesis, identifying connectivity degradation without topology change as a detectable network failure mode. The result is a candidate structural-monitoring framework for behavioral populations with sufficient telemetry.

2605.20386 2026-05-21 cs.MM cs.CY cs.HC cs.SD

Music of Changing Lines: Toward a Culturally Situated Approach to the I-Ching

变化之音:走向一种文化情境下的易经研究

Ling Qi, Aleksandra Teng Ma, Alexandria Smith

AI总结 本文提出一种文化情境下的易经解读方法,通过交互系统将易经作为意义生成框架,利用概率音乐过程实时生成六爻和变爻,并通过大语言模型Gemini进行解释,再转化为生成音乐模型Lyria的提示,产生响应性的音乐实现。

Comments Published and presented at the International Computer Music Conference (ICMC) 2026

详情
AI中文摘要

易经是中国思想史中最具影响力的文本之一,融合占卜、宇宙学和伦理反思。虽然西方实验音乐,特别是约翰·凯奇,曾将易经作为偶然操作的灵感来源,但这些引用往往脱离了赋予文本意义的解释和哲学过程。本文《变化之音》提出一个交互系统,使易经重新成为承载意义的框架,而非中性的随机生成器。用户执行文王法硬币投掷,该过程通过概率音乐过程实时进行。生成的六爻和变爻由大语言模型Gemini根据用户的查询进行解释。这种文本解释随后转化为生成音乐模型Lyria的提示,产生响应性的音乐实现。通过将AI定位为解释中介而非创作权威,系统突显了易经的仪式、解释和参与作为主要的音响材料。《变化之音》通过展示生成AI如何支持参与性、意义驱动的音乐过程,而无需规定音乐结构或取代人类主动性,扩展了计算机音乐的过程驱动传统。

英文摘要

The I-Ching is one of the most influential texts in Chinese intellectual history, integrating divination, cosmology, and ethical reflection. While Western experimental music, most notably John Cage, has drawn on the I-Ching as a source of chance operation, such appropriations have often detached its formal mechanisms from the interpretive and philosophical processes that give the text meaning. This work, Music of Changing Lines, presents an interactive system that re-centers the I-Ching as a meaning-bearing framework rather than a neutral randomizer. Users perform Wen Wang Fa coin casting, which is accompanied in real time through probabilistic musical processes. The resulting hexagrams and changing lines are interpreted by a large language model, Gemini, in relation to the user's inquiry. This textual interpretation is then translated into a prompt for a generative music model, Lyria, producing a responsive musical realization. By situating AI as an interpretive intermediary rather than a compositional authority, the system foregrounds the I-Ching's ritual, interpretation, and participation as the primary sonic materials. Music of Changing Lines extends process-driven traditions in computer music by demonstrating how generative AI can support participatory, meaning-driven musical processes without prescribing musical structure or replacing human agency.

2605.20368 2026-05-21 cs.CR cs.AI

Security Document Classification with a Fine-Tuned Local Large Language Model: Benchmark Data and an Open-Source System

使用微调的本地大语言模型进行安全文档分类:基准数据和开源系统

Ivan Dobrovolskyi

AI总结 本研究提出TorchSight开源系统,利用微调后的Qwen 3.5 27B模型对安全文档进行分类,展示了其在78,358个样本和GPT-4合成数据上的高分类准确率,并验证了本地模型在安全文档分类中的有效性。

详情
AI中文摘要

扫描敏感信息文档的组织面临实际问题。云服务要求数据发送到外部基础设施,而基于规则的工具往往遗漏依赖上下文的威胁。本研究提出了TorchSight,一个围绕微调后的Qwen 3.5 27B模型构建的开源本地系统,用于安全文档分类。模型在78,358个样本和覆盖七个安全类别和51个子类别的GPT-4合成数据上进行训练。在主要评估1,000份文档上,模型达到95.0%的类别级准确率(95%置信区间:93.5-96.2)。在相同提示协议下,测试的商业模型得分为75.4-79.9%。在单独的外部500个保留样本集上,模型达到93.8%的准确率,表明性能超越了主要基准,尽管性能差异取决于数据集组成和困难边界情况。结果表明,微调的本地模型可以在保持文档处理本地控制的同时支持准确的安全文档分类。

英文摘要

Organizations that scan documents for sensitive information face a practical problem. Cloud services require data to be sent to external infrastructure, while rule-based tools often miss threats that depend on context. This study presents TorchSight, an open-source local system for security document classification built around a fine-tuned Qwen 3.5 27B model. The model was trained on 78,358 samples from 13 permissively licensed sources and GPT-4 synthetic data covering seven security categories and 51 subcategories. In the main evaluation on 1,000 documents, the model reached 95.0% category-level accuracy (95% confidence interval: 93.5-96.2). The tested commercial models scored 75.4-79.9% under the same prompting protocol. On a separate external set of 500 held-out samples, the model reached 93.8% accuracy, which suggests that performance extends beyond the main benchmark, although the margin depends on dataset composition and difficult boundary cases. The results show that a fine-tuned local model can support accurate security document classification while keeping document processing under local control.

2605.20345 2026-05-21 stat.ML cs.LG

Corrected Integrated Laplace Approximation for Bayesian Inference in Latent Gaussian Models

修正的积分拉普拉斯近似法用于潜在高斯模型的贝叶斯推断

Jinlin Lai, Charles C. Margossian, Daniel R. Sheldon

AI总结 本文提出了一种重要性采样方案来纠正积分拉普拉斯近似法(ILA)在潜在高斯模型(LGMs)中引入的误差,通过增加重要性采样的样本数使近似后验收敛到正确后验,并在自动微分框架中实现该方法以支持超参数推断中的梯度基算法,特别是哈密顿蒙特卡洛方法。

详情
AI中文摘要

潜在高斯模型(LGMs)是一类流行的贝叶斯分层模型,包括高斯过程、某些空间模型和混合效应模型。对LGMs进行高效贝叶斯推断通常需要对潜在变量进行边缘化。对于具有非高斯似然的LGMs,精确边缘化是不可能的,一种流行的方法是使用积分拉普拉斯近似(ILA)进行近似边缘化。使用ILA会产生一个近似后验,在某些情况下,它可能与正确后验有显著差异,从而影响下游应用。我们提出了一种重要性采样方案来纠正ILA引入的误差。通过增加重要性采样的样本数,ILA产生的后验将收敛到正确后验。这一想法通过伪边缘化、拟蒙特卡洛和随机化拟蒙特卡洛等技术实现。我们将在自动微分框架中实现我们的方法,以支持在超参数推断中的梯度基算法。对于后者,我们特别考虑使用哈密顿蒙特卡洛方法。我们展示了在各种应用模型中减少误差的好处。

英文摘要

Latent Gaussian models (LGMs) are a popular class of Bayesian hierarchical models that include Gaussian processes, as well as certain spatial models and mixed-effect models. Efficient Bayesian inference of LGMs often requires marginalizing out the latent variables. For LGMs with a non-Gaussian likelihood, exact marginalization is not possible and a popular approach is to do approximate marginalization with an integrated Laplace approximation (ILA). Using ILA produces an approximate posterior which, in some settings, can differ significantly from the correct posterior, which impacts downstream applications. We propose an importance sampling scheme to correct the error introduced by ILA. By increasing the number of samples in importance sampling, the posterior with ILA converges to the correct posterior. This idea is realized with various techniques, including pseudo-marginalization, quasi-Monte Carlo and randomized quasi-Monte Carlo. We implement our methods in an automatic differentiation framework to support gradient-based algorithms when doing inference on the hyperparameters. For the latter, we specifically consider the use of Hamiltonian Monte Carlo. We demonstrate the benefits of reduced error in various applied models.

2605.20328 2026-05-21 cond-mat.stat-mech cond-mat.str-el cs.AI math.CO

Targeting Clause Type Distributions: a Picklock for Random Satisfiability Problems

针对子句类型分布:随机满足问题的一种Picklock

J. Schwardt, J. C. Budich

AI总结 本文提出Target-SAT算法,通过利用问题中的统计信息,显著提高了随机满足问题在最困难区域的可解规模,并解释了传统局部搜索算法受限于低能量陷阱的原因。

Comments 7+2 pages, 6+2 figures

详情
AI中文摘要

优化问题如NP难的3-SAT问题为在强相关多体系统中寻找基态的困难任务提供了重要基准。研究随机3-SAT问题作为Ising自旋哈密顿量在统计物理中的应用,已获得重要见解,包括存在可满足性相变的存在以及预测特别困难实例的临界参数线。然而,解决这些实例的进展在数十年内一直很有限。在此,我们引入Target-SAT(TSAT)算法,大致将最困难区域的可解问题规模提高了三倍,在广泛邻近区域甚至有更大的改进。通过利用问题中隐藏的统计信息,TSAT在随机局部搜索中主动引导至相关参数空间内的目标。我们的分析还解释了为什么已建立的局部搜索算法受限于相对较小的系统规模,因为存在巨大的低能量陷阱。此外,我们以主导的附加复杂性障碍物来表征上述临界线,其指数性扩展仅在相关参数空间附近被TSAT迅速克服。通过TSAT,解决已知最困难的随机满足问题的领先地位回归到随机局部搜索算法的领域。

英文摘要

Optimization problems such as the NP-complete 3-SAT provide an important benchmark for the difficult task of finding ground-states in strongly correlated many-body systems with rugged energy landscapes. The study of random 3-SAT problems as Ising spin Hamiltonians in statistical physics has yielded major insights including the existence of a satisfiability phase transition, and the prediction of a critical parameter line of particularly hard instances. Yet, progress on solving those instances has been scarce for several decades. Here, introducing the Target-SAT (TSAT) algorithm, we roughly triple the tractable problem sizes in the hardest regime, with an even greater improvement in a vast range of neighboring regions. By leveraging statistical information hidden in the combinatorial constraints of the problem, TSAT is actively guided in its stochastic local search toward a target within the relevant parameter space. Our analysis also explains why established local search algorithms are limited to relatively small system sizes due to a vast low-energy trap. Furthermore, we characterize the aforementioned critical line in terms of a dominant additional complexity barrier, whose exponential scaling is quickly overcome by TSAT only in the surrounding parameter space. With TSAT, the lead in solving the hardest known random satisfiability problems returns to the realm of stochastic local search algorithms.

2605.20326 2026-05-21 cond-mat.str-el cs.AI

Representability-Aware Neural Networks for Reduced Density Matrices: Application to Fractional Chern Insulators

具有可表示性的神经网络用于减少密度矩阵:应用于分数陈绝缘体

Justin B. Hart, Awwab A. Azam, Thomas Li, Yunxuan Li, Ye Bi, Haining Pan, Jiabin Yu

AI总结 本文提出了一种具有可表示性的神经网络框架,用于预测两粒子减少密度矩阵,该框架通过架构和损失函数整合了部分可表示性条件,并能应用于不同动量网格,从而评估不同网格上的可表示性条件。该方法用于在大动量网格上预测2-RDM或作为优化的变分2-RDM ansatz。在扭曲双层MoTe₂的一带投影模型中,应用于3.89度的扭角和2/3的空穴填充,展示了该方法的优越性。

Comments 12+32 Pages, 4+10 Figures, 0+19 Tables

详情
AI中文摘要

我们开发了一种具有可表示性和插值能力的神经网络(NN)框架,用于预测两粒子减少密度矩阵(2-RDMs)。该NN通过其架构和损失函数整合了部分可表示性条件,并能够应用于不同的动量网格,从而在多个网格上评估可表示性条件,我们称之为插值可表示性条件。该框架既可以用于通过插值小网格的精确结果来预测大网格上的2-RDM,也可以作为通过在任意网格上优化能量最小化来优化的变分2-RDM ansatz。我们将这种方法应用于扭曲双层MoTe₂的一带投影模型中的分数陈绝缘体,在扭角为3.89°和空穴填充为2/3的情况下。通过在具有12或18个动量点的精确对角化(ED)2-RDMs上训练六个不同的NN架构,最佳的NN是残差多层感知机,它在97.07%-98.18%的相对精度下预测6×6的2-RDM,但预测的能量比ED基态能量高77.353 meV。然后,我们对NN在多个网格上进行变分优化,包括6×6网格,预测出6×6网格的能量比ED低0.104 meV,同时保持98.94%-98.96%的精度。与传统的边界点半正定规划相比,该NN在参数数量仅为传统方法的1/20的情况下,实现了更准确的能量预测和相似的精度。最终,我们将在变分优化的NN中添加一个具有48个动量点的对称网格,并在该网格上提供许多体基态能量和许多体量子度量的预测。

英文摘要

We develop a representability-aware and interpolable neural network (NN) framework for predicting two-particle reduced density matrices (2-RDMs). The NN incorporates a subset of representability conditions through its architecture and loss function, and can operate on different momentum meshes, enabling evaluating the representability conditions across multiple meshes, which we call interpolated representability condition. The framework can be used either to predict 2-RDMs on large momentum meshes by interpolating exact results from small meshes, or as a variational 2-RDM ansatz optimized by energy minimization on arbitrary meshes. We apply this approach to the fractional Chern insulator in the one-band projected model of twisted bilayer MoTe$_2$ at twist angle $3.89^\circ$ and hole filling $2/3$. Trained on exact-diagonalization (ED) 2-RDMs from meshes with $12$ or $18$ momentum points using six different NN architectures, the best NN is the residual multilayer perceptron, which predicts the $6\times6$ 2-RDM with $97.07\%-98.18\%$ accuracy relative to the ED 2-RDM but predicts an energy $77.353$ meV above ED ground-state energy. We then variationally optimize the NN on several meshes including $6\times6$, predicting a $6\times 6$ energy of just $0.104$ meV below ED while maintaining $98.94\%-98.96\%$ accuracy. Compared with the conventional boundary-point semidefinite programming, which gives an energy $5.560$ meV below ED with $96.40\%-98.94\%$ accuracy, the NN achieves a more accurate energy and similar accuracy while using only less than 1/20 as many parameters. Eventually, we add a symmetric mesh of $48$ momentum points to the variational optimization of the NN, and provide a prediction of the many-body ground-state energy and the many-body quantum metric on that mesh.

2605.20290 2026-05-21 cs.GR cs.CV

TelePhysics: Physics-Grounded Multi-Object Scene Generation from a Single Image with Real-Time Interaction

TelePhysics: 从单张图像生成物理一致的多物体场景 with 实时交互

Xin Zhang, Yabo Chen, Yijie Fang, Wanying Qu, Haibin Huang, Chi Zhang, Feng Xu, Xuelong Li

AI总结 本文提出TelePhysics,一种无需训练的框架,通过整体场景级3D重建将单张图像转换为物理一致且可控的视频。该方法通过统一的空间坐标系统表示完整场景几何,解决物体穿透和对齐模糊问题,实现准确的多物体交互和更丰富的复杂控制类型,从而在保持逼真视觉保真度的同时实现实时物理交互预览。

详情
AI中文摘要

Recent generative video models achieve impressive visual quality but remain constrained by limited physical consistency and controllability. Existing video generation methods provide minimal physical control, and single-image-to-3D conversion approaches often suffer from object interpenetration. Furthermore, physics-based scene-level 3D generation methods exhibit spatial misalignment, stylized artifacts, and inconsistencies with the input data, restricting their use in realistic interactive video synthesis. We propose TelePhysics, a 免训练 framework that converts a single image into a physically consistent and controllable video through holistic scene-level 3D reconstruction. By representing the full scene geometry in a unified spatial coordinate system, TelePhysics resolves object penetration and alignment ambiguity. Unlike prior methods, this formulation enables accurate scenelevel multi-object interactions and introduces richer, complex control types for advanced mechanicsbased manipulation. By decoupling simulation from rendering, TelePhysics bypasses latency-heavy priors, achieving real-time physical interaction previews paired while preserving photorealistic visual fidelity. Experimental results demonstrate that TelePhysics substantially outperforms prior methods in physical fidelity, spatial coherence, and controllability. The open-source code is available at https://github.com/xinzhang007/TelePhysics.

英文摘要

Recent generative video models achieve impressive visual quality but remain constrained by limited physical consistency and controllability. Existing video generation methods provide minimal physical control, and single-image-to-3D conversion approaches often suffer from object interpenetration. Furthermore, physics-based scene-level 3D generation methods exhibit spatial misalignment, stylized artifacts, and inconsistencies with the input data, restricting their use in realistic interactive video synthesis. We propose TelePhysics, a training-free framework that converts a single image into a physically consistent and controllable video through holistic scene-level 3D reconstruction. By representing the full scene geometry in a unified spatial coordinate system, TelePhysics resolves object penetration and alignment ambiguity. Unlike prior methods, this formulation enables accurate scenelevel multi-object interactions and introduces richer, complex control types for advanced mechanicsbased manipulation. By decoupling simulation from rendering, TelePhysics bypasses latency-heavy priors, achieving real-time physical interaction previews paired while preserving photorealistic visual fidelity. Experimental results demonstrate that TelePhysics substantially outperforms prior methods in physical fidelity, spatial coherence, and controllability. The open-source code is available at https://github.com/xinzhang007/TelePhysics.

2605.20286 2026-05-21 cs.CR cs.LG

Adaptive Probe-based Steering for Robust LLM Jailbreaking

适应性探针引导用于鲁棒大语言模型劫持

Junxi Chen, Junhao Dong, Xiaohua Xie

AI总结 本文提出了一种基于模型提取的适应性探针引导方法,通过动态调整引导强度来提升大语言模型劫持的鲁棒性和有效性,无需额外对比提示或手动调参,显著提高了攻击效果。

Comments 19 pages, 13 figures, accepted by ICML 2026

详情
AI中文摘要

近期研究表明,对比引导在大语言模型(LLMs)劫持中具有潜力。然而,现有方法依赖于有限且本质上偏见的对比提示,并需要繁琐的手动调整引导强度,限制了其鲁棒性和有效性。本文利用模型提取的想法来引导学习到的引导向量以近似理想向量,并提出基于对比激活统计信息适应性地调整引导强度。实验表明,我们的方法显著提高了基于探针的引导的有效性和鲁棒性,无需任何额外的对比提示或繁琐的手动调参。作为一篇攻击论文,本文旨在揭示加固LLMs的崩溃,将平均有害性得分从6%提升到70%。我们的代码可在https://github.com/fhdnskfbeuv/adaptiveSteering上获得。

英文摘要

Recent work has demonstrated the potential of contrastive steering for jailbreaking Large Language Models (LLMs). However, existing methods rely on limited and inherently biased contrastive prompts and require laborious manual tuning of steering strength, limiting their robustness and effectiveness. In this paper, we leverage the idea of model extraction to guide the learned steering vectors to approximate the ideal one and propose tuning the steering strength adaptively based on contrastive activations' statistics. Experiments demonstrate that our method notably improves the effectiveness and robustness of probe-based steering, without any extra contrastive prompts or laborious manual tuning. Being an attack paper, this paper focuses on revealing the breakdown of fortified LLMs, raising the average harmfulness score from 6\% to 70\%. Our code is available at https://github.com/fhdnskfbeuv/adaptiveSteering.

2605.20281 2026-05-21 econ.GN cs.LG q-fin.EC

The Economics of AI Inference: Inflation Dynamics, Welfare Costs, and Optimal Monetary Policy under the Inference-Cost Phillips Curve

人工智能推理的经济学:通胀动态、福利成本和在推理成本菲利普曲线下的最优货币政策

Gustav Olaf Yunus Laitinen-Fredriksson Lundström-Imanov

AI总结 本文提出了一种统一的微观经济学和货币理论,研究人工智能推理成本及其对通胀、福利和最优货币政策的影响。通过引入推理成本菲利普曲线(ICPC),并证明了其结构斜率,分析了消费者福利的 Hicks-卡尔多分解,推导了广义的泰勒原则,并确定了最优货币政策响应系数。

Comments 6 pages, 5 tables

详情
AI中文摘要

我们发展了一种统一的微观经济学和货币理论,研究人工智能推理成本及其对通胀、福利和最优货币政策的影响。我们引入了推理成本菲利普曲线(ICPC),即一个增强的新凯恩斯菲利普曲线,其中企业层面的差异化商品边际成本包括一个非平凡的人工智能推理成分lambda-bar,并证明了一个闭合形式的结构斜率kappa*_inf = lambda-bar * kappa,其中kappa是标准的Calvo-Yun斜率。我们推导了在推理成本冲击下的消费者福利的 Hicks-卡尔多分解,证明了在增强的经济中的广义泰勒原则,并刻画了在承诺下的最优货币政策响应系数psi*_inf = (1 + phi*rho) * lambda-bar * kappa。一个二阶福利损失公式闭合了模型。我们用两步GMM估计器和Newey-West HAC标准误差以及Hansen J检验将理论与美国2022年M01-2026年M04月度数据相对比,恢复了一个经验斜率kappa-hat_inf = 0.087 (HAC标准误差0.021),该斜率位于结构预测的一个标准误差内。一个50个滚动窗口子窗口的缩放回归得到b-hat = 0.987 (R^2 = 0.998),与近单位弹性传递一致。一个G7简化的面板模型,使用Driscoll-Kraay HAC标准误差,得到b-hat^G7 = 0.094 (s.e. 0.026),并进行了瓦尔德检验,未能拒绝跨国家同质性(p = 0.78)。该框架为人工智能推理成本动态、在生成式AI冲击下的货币政策以及推理驱动通胀的福利成本的联合研究提供了一个单一的均衡框架。

英文摘要

We develop a unified microeconomic and monetary theory of artificial intelligence inference costs and their pass-through to inflation, welfare, and optimal monetary policy. We introduce the Inference-Cost Phillips Curve (ICPC), an augmented New Keynesian Phillips curve in which firm-level marginal costs of producing differentiated goods include a non-trivial AI inference component lambda-bar, and prove a closed-form structural slope kappa*_inf = lambda-bar * kappa, where kappa is the standard Calvo-Yun slope. We derive a welfare-relevant Hicks-Kaldor decomposition of consumer welfare under inference-cost shocks, prove a generalized Taylor principle for the inference-augmented economy, and characterize the optimal monetary policy response coefficient psi*_inf = (1 + phi*rho) * lambda-bar * kappa under commitment. A second-order welfare loss formula closes the model in closed form. We confront the theory with U.S. monthly data 2022:M01-2026:M04 using a two-step GMM estimator with Newey-West HAC standard errors and Hansen J-test, recovering an empirical slope kappa-hat_inf = 0.087 (HAC s.e. 0.021) which lies within one standard error of the structural prediction. A scaling regression over 50 rolling-window subwindows yields b-hat = 0.987 (R^2 = 0.998), consistent with a near-unit-elasticity pass-through. A G7 reduced-form panel with Driscoll-Kraay HAC standard errors yields b-hat^G7 = 0.094 (s.e. 0.026), and a Wald test fails to reject cross-country homogeneity (p = 0.78). The framework provides a single equilibrium scaffold for the joint study of AI inference cost dynamics, monetary policy under generative-AI shocks, and the welfare cost of inference-driven inflation.

2605.20279 2026-05-21 econ.GN cs.CY cs.LG q-fin.EC

The Economics of Model Collapse: Equilibrium, Welfare, and Optimal Provenance Subsidies in Synthetic Data Markets

模型崩溃的经济学:均衡、福利与合成数据市场中的最优来源补贴

Gustav Olaf Yunus Laitinen-Fredriksson Lundström-Imanov

AI总结 本文研究了合成数据市场中模型崩溃的微观经济学问题,提出了合成数据污染均衡理论,推导了福利分解公式,并得出了最优来源补贴和水印强度的闭式表达式,同时证明了信息约束下的实现不可能性。

Comments 7 pages, 5 tables, 1 algorithm; IEEEtran conference format; submitted to IEEE BigData 2026

详情
AI中文摘要

生成式人工智能正在迅速改变训练数据的供应端:越来越多的新令牌、图像和结构化记录是由前一代模型而非人类创作者生成的。对这类合成内容的递归训练会引发可测量且往往不可逆的分布忠实度损失,这种现象称为模型崩溃。本文发展了首个统一的合成数据市场微观经济学理论,引入了合成数据污染均衡(SDCE),证明了其存在性和通用唯一性,推导了福利分解W = W_prod + W_cons - L_coll - L_info,建立了Wasserstein-梯度流均场崩溃极限,证明了在信息约束下的实现不可能性,并获得了福利最大化来源补贴s* = KL(q||p)/(2 kappa)和福利最大化水印强度w* = (1 - psi) KL(q||p)/(2 kappa psi)的闭式表达式。证明了任何仅使用生产端观察的来源估计器的信息论Cramer-Rao下限,并展示了Provenance-Market Iterative Retraining(PMIR)算法在常数范围内达到该下限并收敛到epsilon-SDCE的O(epsilon^-2 log T)次迭代。对C4合成基准的简化形式OLS估计在十个重新训练世代上得到崩溃率系数b-hat = 0.181(HAC标准差0.024),在结构预测0.183的一标准误差内。校准实验将第十代模型质量提升23.1%超过无监管基准,同时将2-Wasserstein漂移从0.318降至0.142。在世代t ∈ {1,...,10}上的缩放实验恢复了对数形式的崩溃定律log Q_t = log Q_0 - 0.183 t rho^2,R^2 = 0.962。

英文摘要

Generative artificial intelligence is rapidly transforming the supply side of training data: an increasing share of new tokens, images, and structured records is produced by previous-generation models rather than by human originators. Recursive training on such synthetic content induces a measurable and often irreversible loss of distributional fidelity, a phenomenon known as model collapse. We develop the first unified microeconomic theory of synthetic data markets under model collapse. We introduce the Synthetic Data Contamination Equilibrium (SDCE), prove existence and generic uniqueness, derive a welfare decomposition W = W_prod + W_cons - L_coll - L_info, establish a Wasserstein-gradient-flow mean-field collapse limit, prove an impossibility of information-constrained implementation, and obtain closed-form expressions for the welfare-maximizing provenance subsidy s* = KL(q||p)/(2 kappa) and the welfare-maximizing watermark strength w* = (1 - psi) KL(q||p)/(2 kappa psi). We prove an information-theoretic Cramer-Rao lower bound on any provenance estimator using only producer-side observations and show that the Provenance-Market Iterative Retraining (PMIR) algorithm attains this bound up to constants while converging to an epsilon-SDCE in O(epsilon^-2 log T) iterations. A reduced-form OLS estimation on a C4-synthetic benchmark over ten retraining generations yields a collapse-rate coefficient b-hat = 0.181 (HAC s.e. 0.024), within one standard error of the structural prediction 0.183. Calibrated experiments raise generation-ten model quality by 23.1 percent over the unregulated benchmark while lowering the 2-Wasserstein drift on a held-out diversity probe from 0.318 to 0.142. Scaling experiments over generations t in {1,...,10} recover a logarithmic-in-t collapse law log Q_t = log Q_0 - 0.183 t rho^2 with R^2 = 0.962.

2605.20274 2026-05-21 cs.GR cs.AI

PolycubeNet: A Dual-latent Diffusion Model for Polycube-Based Hexahedral Mesh Generation

PolycubeNet: 一种基于多立方体的双潜在扩散模型用于六面体网格生成

Lu He, Qitao Deng, Junjiang Deng, Liangbin Deng, Yanjun Liang, Wenting Yang, Guoqiang Wang, Na Lei

AI总结 本文提出了一种基于条件扩散模型的端到端框架,用于生成基于多立方体的六面体网格,通过双潜在条件扩散架构有效解耦了计算复杂度与输入输出分辨率,提高了生成效率和鲁棒性。

详情
AI中文摘要

六面体网格广泛用于模拟流水线,但自动生成复杂CAD几何体仍具有挑战性。基于多立方体的六面体网格生成是一种代表性方法,因其结构规则且参数化友好,但现有多立方体构造方法通常依赖于复杂的表面分割和局部启发式方法,这可能会产生伪影或在困难形状上失败。在本文中,我们提出了一种基于条件扩散模型的端到端框架用于多立方体生成。给定一个以点云表示的输入几何体,我们的方法直接生成对应的多立方体点云,消除了显式表面分割或预定义多立方体模板的需要。我们方法的核心是一种双潜在条件扩散架构,将计算上昂贵的自注意力操作限制在固定容量、低维的潜在空间中。这种设计有效地将计算复杂度与输入几何体和输出多立方体的分辨率解耦,从而避免了点云自注意力机制中典型的二次成本,同时支持灵活的输入和输出分辨率。为了获得六面体网格,生成的多立方体通过刚性和非刚性点云配准对齐到输入形状,以建立表面对应关系,随后通过多立方体到六面体的流程。我们还创建并发布了CAD网格及其对应的多立方体网格配对数据集,以及我们模型的核心实现。实验表明,PolycubeNet能够泛化到具有任意亏格的复杂CAD模型,并在几秒钟内生成高质量的多立方体结构,比先前基于学习的方法在鲁棒性和效率上有所提升。

英文摘要

Hexahedral meshes are widely used in simulation pipelines, yet automatic generation remains challenging for complex CAD geometries. Polycube-based hexahedral meshing is a representative approach due to its regular, parameterization-friendly structure, but existing polycube construction methods often rely on intricate surface segmentation and local heuristics, which can produce artifacts or fail on difficult shapes. In this paper, we propose an end-to-end framework for polycube generation based on conditional diffusion models. Given an input geometry represented as a point cloud, our method directly produces a corresponding polycube point cloud, eliminating the need for explicit surface segmentation or predefined polycube templates. At the core of our approach is a dual-latent conditional diffusion architecture that confines computationally expensive self-attention operations to a fixed-capacity, low-dimensional latent space. This design effectively decouples computational complexity from the resolution of both the input geometry and the output polycube, thereby avoiding the quadratic cost typical of point cloud self-attention mechanisms while supporting flexible input and output resolutions. To obtain a hexahedral mesh, the generated polycube is aligned to the input shape via rigid and non-rigid point cloud registration to establish surface correspondence, followed by a polycube-to-hex pipeline. We additionally create and release a paired dataset of CAD meshes and their corresponding polycube meshes, together with the core implementation of our model. Experiments show that PolycubeNet generalizes to complex CAD models with arbitrary genus and produces high-quality polycube structures within seconds, improving robustness and efficiency over prior learning-based approaches.

2605.20271 2026-05-21 stat.ML cs.LG

Multi-Head Attention as Ensemble Nadaraya-Watson Estimation: Variance Reduction, Decorrelation, and Optimal Head Diversity

多头注意力作为恩德里亚-沃森估计的集合:方差减少、去相关和最优头多样性

Ernest Fokoué

AI总结 本文提出多头注意力可以视为恩德里亚-沃森核回归估计器的集合,通过分析头输出的去相关性,推导出方差减少与头多样性之间的关系,并提出头多样性指数来衡量不同头之间的去相关程度,最终得出最优的头数量和维度分配方案。

Comments 14 pages

详情
AI中文摘要

我们发展了多头注意力(MHA)作为恩德里亚-沃森(NW)核回归估计器集合的严谨统计理论。基于单头softmax注意力与NW估计器之间的代数恒等式,我们证明MHA是H个NW估计器的结构化集合,每个在键空间的不同的学习投影子空间中操作。我们推导出MHA均方误差的显式偏倚-方差-协方差分解,表明方差减少不仅取决于头数H,还根本上取决于头输出的去相关性。去相关由学习投影子空间之间的主角之间决定:正交投影产生最大方差减少;对齐投影产生无。我们引入头多样性指数(HDI),一个可计算的谱度量,衡量头之间的去相关程度,并证明MHA均方误差随HDI单调递减。这为经验观察到的注意力头的专业化提供了第一个严谨的理论解释。在固定总维度预算D=H*d_k下,我们解决最优头维度分配问题,推导出MSE最小化的配对(H*,d_k*)从数据分布和回归平滑度。解决方案得出新的架构扩展定律:最优每头维度随着训练集大小对数增长,而最优头数几乎与总预算D线性增长。我们的框架统一了三个先前的工作:单头注意力的NW理论、集合学习的一般加权理论以及生物和计算集合之间的去相关-方差减少同构性。多头注意力是Transformer对通用原则的实例化:相同代理加上促进多样性的机制产生涌现最优性。

英文摘要

We develop a rigorous statistical theory of multi-head attention (MHA) as an ensemble of Nadaraya-Watson (NW) kernel regression estimators. Building on the algebraic identity between single-head softmax attention and the NW estimator, we prove that MHA is a structured ensemble of H NW estimators, each operating in a distinct learned projection subspace of the key space. We derive an explicit Bias-Variance-Covariance decomposition of the MHA mean squared error, showing that variance reduction depends not merely on the number of heads H but fundamentally on the decorrelation of head outputs. Decorrelation is governed by the principal angles between learned projection subspaces: orthogonal projections yield maximum variance reduction; aligned projections yield none. We introduce the Head Diversity Index (HDI), a computable spectral measure of inter-head decorrelation, and prove that MHA mean squared error is monotonically decreasing in HDI. This provides the first rigorous theoretical explanation for the empirically observed specialization of attention heads. Under a fixed total-dimension budget D = H * d_k, we solve the optimal head-dimension allocation problem, deriving the MSE-minimizing pair (H*, d_k*) from data distribution and regression smoothness. The solution yields a new architectural scaling law: the optimal per-head dimension grows logarithmically with training set size, while the optimal number of heads grows nearly linearly with the total budget D. Our framework unifies three strands of prior work: the NW theory of single-head attention, the general weighting theory for ensemble learning, and the decorrelation-variance-reduction isomorphism between biological and computational ensembles. Multi-head attention is the Transformer's instantiation of a universal principle: identical agents plus diversity-enforcing mechanisms yields emergent optimality.

2605.20254 2026-05-21 cs.IR cs.AI cs.CV cs.LG

Efficient Table QA via TableGrid Navigation and Progressive Inference Prompting

通过表格网格导航和逐步推理提示实现高效的表格问答

Amritansh Maurya, Navjot Singh, Mohammed Javed, Omar Moured

AI总结 本文提出了一种无需训练的表格问答方法,通过TableGrid导航和Progressive Inference Prompting框架,提升了表格问答的精度和效率,并在多个数据集上验证了其有效性。

Comments Accepted for Presentation in ICDAR 2026, Vienna, Austria

详情
AI中文摘要

大型语言模型(LLMs)在自然语言处理任务中表现出色,但在表格数据上的表现仍需进一步研究,因为表格问答(TQA)需要精确的单元格检索和多步结构化推理。现有工作通过微调或在任务特定的表格数据上训练LLMs来改进TQA,但通常缺乏对模型如何导航表格和推导答案的可验证控制。在本文中,我们提出了一种无需训练的TQA方法,包含两个结构化提示框架:TableGrid导航(TGN),通过三模块循环迭代导航行和列以定位证据并细化答案;Progressive Inference Prompting(PIP),通过根据查询强制识别列,以明确的逐步行选择约束进行推理。我们在TableBench和FeTaQa数据集上评估了17个LLMs和6个基线模型。在TableBench上,TGN比最强基线提高了3.8分,而在FeTaQa上,PIP在ReAct和Chain-of-Thought上实现了SOTA性能。除了推理时间的提升外,PIP和TGN还可以作为监督模板来微调小型模型,在资源受限的设置中缩小与更大架构之间的性能差距,为TQA提供了多功能且成本效益高的解决方案。

英文摘要

Large Language Models (LLMs) have shown promising results on NLP tasks, however, their performance on tabular data still needs research attention, because Table Question-Answering (TQA) requires precise cell retrieval and multi-step structured reasoning. Existing work improves TQA either by fine-tuning or training LLMs on task-specific tabular data, but often lacks verifiable control over how the model navigates tables and derives answers. In this work, we propose a training-free TQA approach with two structured prompting frameworks: TableGrid Navigation (TGN), which iteratively navigates rows and columns via a three-module loop to locate evidence and refine answers, and Progressive Inference Prompting (PIP), which enforces columns identification for explicit progressive row selection constraint according to the query. We evaluate 17 LLMs against 6 baselines on TableBench and FeTaQa dataset. On TableBench, TGN improves over the strongest baseline by 3.8 points, and on FeTaQa, PIP achieves SOTA performance over ReAct and Chain-of-Thought. Beyond inference-time gains, PIP and TGN can also serve as supervision templates to fine-tune small models, narrowing the performance gap to much larger architectures in resource-constrained settings, offering versatile and cost-efficient solution for TQA.

2605.20245 2026-05-21 cs.SI cs.LG

Prism: Structural Symmetry Scanning via Duality-Constrained Laplacian Projection

Prism:通过双重视约束拉普拉斯投影进行结构对称性扫描

Jiatong Xie

AI总结 Prism通过双重视约束拉普拉斯投影方法,利用图拉普拉斯矩阵和双重视算子计算结构对称性缺陷,以检测复杂网络的结构自一致性偏离程度,并在不同数据集上验证其在社区检测和结构应力检测中的有效性。

Comments 10 pages, 4 tables, 1 figure. This work presents a first-principles unsupervised network structural diagnosis framework based on symmetric involution operator and Laplacian commutator constraint. It achieves noise-robust community detection and early structural risk detection in financial time-series networks without supervised training data

详情
AI中文摘要

我们介绍了Prism,一个用于复杂网络结构对称性诊断的框架。给定一个图拉普拉斯矩阵L和一个双重视算子P(一个对称的逆运算),Prism计算双重视缺陷δ(L,P) = ||LP - PL||_F / ||L||_F ——一个标量,衡量网络偏离结构自一致性程度。当P编码网络的真实对称性时,δ接近零并在结构退化时单调上升;任意P给出噪声。我们证明了满足[L', P] = 0的最优L'由闭合形式的块对角投影给出,并提供了一个无监督的交替优化方法,从图自身的费米向量中学习P。在合成网络上的实验表明,真实P的缺陷比索引反转基线更敏感于结构退化,并比模块度更敏感。在带有边噪声的Zachary's Karate Club数据集上,Prism在5%噪声下达到94.5%的社区检测准确度,而原始拉普拉斯基线为76.6%。应用于实时S&P 500数据(2026-05-17)时,Prism检测到结构应力上升(缺陷0.43→0.73在90天内)而表面相关性仍低——一个相关性方法无法检测到的信号。在涵盖五个主要压力事件(2011-2020)的历史回测中,双重视缺陷表现出一致的模式:它在相关性尖峰之前达到高水平,并在结构脆弱期维持高水平,而传统指标将其归类为平静期。双重视缺陷是一种基于原理的结构可接受条件,不需要训练数据,可在毫秒内计算。

英文摘要

We introduce \textbf{Prism}, a framework for structural symmetry diagnosis in complex networks. Given a graph Laplacian $L$ and a duality operator $P$ (a symmetric involution), Prism computes the \emph{duality defect} $δ(L,P) = \|LP - PL\|_F / \|L\|_F$ -- a scalar measuring how far the network deviates from structural self-consistency. When $P$ encodes the network's true symmetry, $δ$ starts near zero and rises monotonically as structure degrades; an arbitrary $P$ gives noise. We prove that the optimal $L'$ satisfying $[L', P] = 0$ is given by a closed-form block-diagonal projection, and provide an unsupervised alternating optimization that learns $P$ from the graph's own Fiedler vector. Experiments on synthetic networks show the true-$P$ defect is $3.38\times$ more sensitive to structural degradation than an index-reversal baseline and more sensitive than modularity. On Zachary's Karate Club with edge noise, Prism achieves $94.5\%$ community detection accuracy at $5\%$ noise versus $76.6\%$ for the raw Laplacian baseline. Applied to live S\&P~500 data (2026-05-17), Prism detects rising structural stress (defect $0.43 \to 0.73$ over 90 days) while surface correlations remain low -- a signal invisible to correlation-based methods. In a historical backtest spanning five major stress events (2011--2020), the duality defect exhibits a consistent pattern: it reaches elevated levels \emph{before} the correlation spike that accompanies each crisis, and sustains high readings during periods of structural fragility that conventional metrics classify as calm. The duality defect is a first-principles structural admissibility condition, requiring no training data and computable in milliseconds.

2605.20244 2026-05-21 cs.LO cs.AI cs.CL cs.LG cs.SE

Lean Refactor: Multi-Objective Controllable Proof Optimization via Agentic Strategy Search

Lean Refactor: 通过代理策略搜索实现多目标可控的证明优化

Jialin Lu, Soonho Kong, Rodrigo Stehling, Kaiyu Yang, Zhangyang Wang, Weiran Sun, Wuyang Chen

AI总结 本文提出Lean Refactor框架,通过检索增强的代理策略搜索,解决多目标、可控和版本鲁棒的Lean证明重构问题,主要贡献是通过预注释的多目标重构策略数据库实现高效的证明优化。

详情
AI中文摘要

我们提出了Lean Refactor,一个插件式的检索增强型代理框架,用于多目标、可控和版本鲁棒的Lean证明重构。LLM生成的证明虽然正确但冗长且易碎,现有重构工作忽视了三个实际挑战:1)Lean重构本质上是多目标的(证明长度、编译成本和版本兼容性常存在矛盾);2)Lean仓库具有脆弱的兼容性,而LLM发布不了解Lean/Mathlib版本;3)基于训练的流水线需要每次新LLM发布时重复微调,无法随模型变化或Lean发布周期扩展。Lean Refactor通过检索预注释的多目标重构策略数据库中的冻结代理LLM,每个策略都密集注释了元数据,如支持的Lean/Mathlib版本和预期的编译成本减少。实验显示在竞争基准上压缩超过70%的token级别,在研究仓库上压缩超过20%,并达到高达60%的编译时间减少,优于先前工作和Claude Code。版本过滤检索进一步提高了目标Lean版本的压缩效果,重构后的miniF2F证明在零样本版本迁移至未来Lean发布时表现优于未重构的对应物。

英文摘要

We present Lean Refactor, a plug-and-play retrieval-augmented agentic framework for multi-objective, controllable, and version-robust refactoring of Lean proofs. LLM-generated proofs are notoriously correct-but-verbose and brittle across library versions, yet existing refactoring works overlook three practical challenges: 1) Lean refactoring is natively multi-objective (proof length, compilation cost, and version compatibility are often in tension); 2) Lean repositories have fragile compatibility, whereas LLM releases are unaware of Lean/Mathlib versions; 3) Training-based pipelines require repeated fine-tuning with each new LLM release, scaling neither with model churn nor with Lean's release cycle. Lean Refactor steers a frozen agentic LLM with retrievals from a curated database of multi-objective refactoring strategies, each densely annotated with metadata such as supported Lean/Mathlib versions and expected compilation-cost reduction. Experiments show over $70\%$ token-level compression on competition benchmarks, over $20\%$ on research repositories, and up to $60\%$ compilation-time reduction, outperforming prior work and Claude Code. Version-filtered retrieval further improves compression on the target Lean version, and refactored miniF2F proofs exhibit stronger zero-shot version transfer to future Lean releases than their unrefactored counterparts.

2605.20222 2026-05-21 quant-ph cs.LG

Quantum End-to-End Learning for Contextual Combinatorial Optimization

量子端到端学习用于上下文组合优化

Jaehwan Lee, Changhyun Kwon

AI总结 本文提出量子端到端学习框架QEL,用于解决上下文组合优化问题,通过量子近似优化算法实现端到端训练,有效捕捉上下文、不确定系数和最优解之间的复杂关系,避免调用NP难优化求解器,展现出在量子时代应用的潜力。

Comments 23 pages, 2 figures, preprint

详情
AI中文摘要

上下文组合优化(CCO)在不确定性决策中起关键作用,但仍是重大挑战。我们提出了量子端到端学习(QEL),这是首个基于量子计算的端到端学习框架,用于CCO,利用量子近似优化算法。受数据重新上传中状态准备和演化的整合启发,我们提出了一种上下文重新上传相分离器,共同捕捉上下文、不确定系数和最优解之间的复杂关系。这使得上下文编码器可以无缝集成到量子替代策略中,实现联合端到端训练,并保证平稳性。利用基于物理原理的优化感知结构,经典方法难以利用,我们的方法通过直接在任务损失上训练,尽管存在离散性和非凸性,仍避免调用NP难优化求解器。QEL在参数数量上显著少于经典基准,实验证明其在量子时代具有工业级应用潜力。

英文摘要

Contextual combinatorial optimization (CCO) plays a critical role in decision-making under uncertainty, yet remains a significant challenge. We present Quantum End-to-End Learning (QEL), the first quantum computing-based end-to-end learning framework for CCO that leverages Quantum Approximate Optimization Algorithms. Inspired by the integration of state preparation and evolution in data re-uploading, we propose a context re-uploading phase-separator that jointly captures the complex relations among contexts, uncertain coefficients, and optimal solutions. This allows a contextual encoder to be seamlessly integrated within a quantum surrogate policy, enabling joint end-to-end training with a stationarity guarantee. Exploiting an optimization-aware structure grounded in physical principles that classical methods cannot readily leverage, our approach demonstrates practicality by directly training on task loss despite the discreteness and nonconvexity, while avoiding calls to NP-hard optimization solvers. QEL empirically achieves competitive performance while requiring substantially fewer parameters than classical benchmarks, highlighting its industrial-level potential for the future quantum era.

2605.20218 2026-05-21 physics.soc-ph cs.AI cs.SI

Network-Based Interventions for HIV Prevention via Cascade-Aware Suppression of Transmission

基于网络的HIV预防干预:通过 cascade 意识的传播抑制

Akseli Kangaslahti, Davin Choo, Milind Tambe, Alastair van Heerden, Cheryl Johnson

AI总结 本文提出了一种基于网络的HIV预防干预方法,通过考虑传播链的抑制来减少新的感染传播。核心方法是将问题建模为一个约束优化问题,并提出了一种多项式时间的近似算法CAST,该算法在多项式时间内达到近似比。主要贡献是证明了该算法在真实世界HIV网络上的有效性。

详情
AI中文摘要

治疗和预防人类免疫缺陷病毒(HIV)仍然是全球卫生领域的重要挑战。虽然抗逆转录病毒治疗提供了病毒抑制的途径,即有效消除个体的传播风险,但系统资源限制限制了干预措施的覆盖面。本文针对在病毒未被抑制的个体中战略分配密集资源以最小化传输网络中新的感染传播的预期传播链进行了研究。我们将这一挑战建模为一个新颖的约束优化问题,其中我们有资源去“治疗”集合P中的k个病毒未被抑制的个体,并建立了其与现有计算文献的理论联系。然后我们提出了一种传播链意识的传播抑制(CAST)算法,该算法在多项式时间内达到(δ, ε)近似比,通过利用与最小k-并集(MkU)问题和Hoeffding型集中界之间的联系。在真实世界HIV网络上的广泛评估表明,CAST在标准公共卫生和计算机科学基线中表现更优。此外,我们还展示了CAST在不同传染病网络、不同边概率初始设置和涉及不完美网络数据的设置中具有实证鲁棒性。

英文摘要

Treating and preventing Human Immunodeficiency Virus (HIV) remains a critical global health challenge. While antiretroviral therapy provides a path toward viral suppression -- effectively eliminating an individual's transmission risk -- systemic resource constraints limit the reach of intervention efforts. This work addresses the strategic distribution of intensive resources among virally unsuppressed individuals to minimize the expected cascade of new infections within a transmission network. We formalize this challenge as a novel constrained optimization problem where we have resources to "treat" $k$ out of a set $\mathbf{P}$ of virally unsuppressed individuals, and establish its theoretical connections to existing computational literature. We then propose Cascade-Aware Suppression of Transmission (CAST), a polynomial-time $(δ, ε)$-approximation algorithm that achieves a $2\sqrt{|\mathbf{P}|}$ approximation ratio by leveraging connections to the Minimum-$k$-Union (MkU) problem and Hoeffding-style concentration bounds. Extensive evaluations on real-world HIV networks demonstrate that CAST outperforms standard public health and computer science baselines. Furthermore, we show that CAST is empirically robust across diverse infectious disease networks, varied edge probability initializations, and settings involving imperfect network data.

2605.20210 2026-05-21 cs.CY cs.AI cs.MA

Governance by Design: Architecting Agentic AI for Organizational Learning and Scalable Autonomy

设计治理:为组织学习和可扩展自主性构建智能体AI

Nelly Dux, Cristina Alaimo, Philippe Roussiere, Abhishek Kumar Mishra

AI总结 本文探讨了智能体AI在组织学习和可扩展自主性中的设计与治理问题,通过案例研究展示了如何通过具体的架构和工作安排实现有效的治理,并总结了七条指导原则。

Comments 17 pages, 1 figure, 3 tables

详情
AI中文摘要

智能体AI系统—能够通过多步骤规划和工具中介行动来追求目标,且具有有限直接监督的系统—正从实验原型转向企业部署。这种转变带来了实施、扩展和治理方面的张力:组织寻求知识和协调工作的可扩展自主性,但必须在系统启动行动、访问企业数据和通过迭代更新进化时保持问责、安全、成本控制和责任。基于对一家大型IT服务公司在2025年开发和分阶段部署智能体系统的深入定性案例,我们展示了治理是通过具体的架构和工作安排实现的,这些安排决定了系统被允许做什么,可以使用哪些工具和数据,如何处理记忆,以及如何在时间上引入性能改进。我们随后提炼出七条教训,解释了如何在运营化和扩展过程中将有效的治理融入智能体AI中。

英文摘要

Agentic AI systems - systems that can pursue goals through multi-step planning and tool-mediated action with limited direct supervision - are moving from experimental prototypes to enterprise deployments. This transition introduces tensions in implementation, scaling, and governance: organizations seek scalable autonomy for knowledge and coordination work, yet must preserve accountability, safety, cost control, and responsibility as systems initiate actions, access enterprise data, and evolve through iterative updates. Building on an in-depth qualitative case of a large IT services company's 2025 development and staged rollout of an agentic system integrated with enterprise tools; we show that governance is implemented through concrete architectural and working arrangements that determine what the system is allowed to do, which tools and data it can use, how memory is handled, and how performance improvements are introduced over time. We then distill seven lessons that explain how to build effective governance into agentic AI during operationalization and scaling.

2605.20209 2026-05-21 cs.GR cs.LG cs.RO

NaP-Control: Navigating Diffusion Prior for Versatile and Fast Character Control

NaP-Control: 为多功能和快速字符控制导航扩散先验

Chia-Wen Chen, Yan Wu, Korrawe Karunratanakul, Siyu Tang

AI总结 本文提出NaP-Control方法,通过强化学习操控任务无关的扩散策略先验的潜在噪声,实现快速、鲁棒且高保真的字符控制,同时通过环境交互优化任务奖励,提升成功率并适应挑战性场景。

详情
AI中文摘要

在基于物理的动画中实现精确、多功能的全身字符控制仍然具有挑战性。最近的基于扩散的策略生成丰富且表达性强的动作,但通常依赖于基于梯度的测试时间引导以满足任务目标,这会减慢速度并降低鲁棒性。我们引入NaP-Control(Navigating Diffusion Prior for Versatile and Fast Character Control),简称NaP。我们的方法使用强化学习操控任务无关的扩散策略先验的潜在噪声,将其引导至任务特定的行为,以实现快速、鲁棒且高保真的控制。与仅依赖离线训练的方法不同,NaP在训练期间与环境交互以校正动作并优化任务奖励,提高成功率并使系统能够适应具有挑战性的场景。通过直接预测任务优化的扩散噪声,NaP消除了去噪过程中的迭代引导,实现了高效的推理。实验表明,NaP在多样化的任务中实现了更高的成功率和更快的推理速度,同时保持自然的动作。

英文摘要

Achieving precise, versatile whole-body character control in physics-based animation remains challenging. Recent diffusion-based policies generate rich and expressive motions but typically rely on gradient-based test-time guidance to satisfy task objectives, which is slow and can reduce robustness. We introduce NaP-Control (Navigating Diffusion Prior for Versatile and Fast Character Control), abbreviated as NaP. Our method uses reinforcement learning to manipulate the latent noise of a task-agnostic diffusion policy prior, steering it toward task-specific behaviors for fast, robust control with high motion fidelity. In contrast to methods that rely solely on offline training, NaP interacts with the environment during training to correct motions and optimize task rewards, improving success rates and enabling adaptation to challenging scenarios. By directly predicting task-optimized diffusion noise, NaP eliminates iterative guidance during denoising and enables efficient inference. Experiments show that NaP attains higher success rates and faster inference while preserving natural motion across diverse tasks.