arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 3851
热门方向导航
2512.17893 2026-06-09 quant-ph cs.AI 版本更新

Exploring the Effect of Basis Rotation on NQS Performance

探索基旋转对NQS性能的影响

Sven Benjamin Kožić, Vinko Zlatić, Fabio Franchini, Salvatore Marco Giampaolo

发表机构 * Institut Ruđer Bošković(鲁德·博什科维奇研究所)

AI总结 通过可解一维Ising模型,研究局部基旋转对神经量子态(NQS)表示和优化的影响,发现基旋转保持优化景观不变但移动目标态位置,导致优化失败与错误波函数结构共存。

详情
AI中文摘要

神经量子态(NQS)是量子多体波函数的强大变分表示,但其性能敏感地依赖于所选基。利用精确可解的一维Ising模型,我们证明局部基旋转保持最小化景观不变,同时将精确基态在参数空间中重新定位。这提供了一个受控框架,以区分表示限制与优化引起的可训练性效应。通过信息几何度量量化的这种几何位移,可以将浅层架构的优化引导至鞍点和高曲率区域。因此,低能量误差可能与错误的波函数结构共存。通过在同一变分架构内比较能量和保真度优化,我们表明即使旋转后的目标态仍然可表示,优化失败也可能持续存在。我们的结果识别了导致NQS基依赖性的几何机制,并激发了景观感知的变分设计。

英文摘要

Neural Quantum States (NQS) are powerful variational representations of quantum many-body wavefunctions, yet their performance depends sensitively on the chosen basis. Using an exactly solvable one-dimensional Ising model, we show that local basis rotations leave the minimization landscape unchanged while relocating the exact ground state in parameter space. This provides a controlled framework to disentangle representational limitations from optimization-induced trainability effects. This geometric displacement, quantified through information-geometric measures, can steer optimization of shallow architectures toward saddle points and high-curvature regions. As a result, low energy errors may coexist with an incorrect wavefunction structure. By comparing energy and infidelity optimization within the same variational architectures, we show that optimization failure can persist even when the rotated target state remains representable. Our results identify a geometric mechanism contributing to basis dependence in NQS and motivate landscape-aware variational design.

2512.11000 2026-06-09 q-bio.NC cs.AI cs.NE 版本更新

Unambiguous Representations in Neural Networks: An Information-Theoretic Approach to Intentionality

神经网络中的无歧义表征:一种信息论方法研究意向性

Francesco Lässig

发表机构 * University of Tübingen(图宾根大学)

AI总结 本文用信息论定义表征歧义度,通过实验证明神经网络连接结构可无歧义编码表征内容,且歧义度与行为准确率正交。

Comments Presented at the Models of Consciousness 6 (MoC6) conference (https://amcs-community.org/moc6-schedule-information/#abstract-36)

详情
AI中文摘要

表征充斥在我们的日常经验中,从表示声音的字母到编码数字文件的比特串。虽然这类表征需要外部定义的解码器来传达意义,但意识体验本质上是不同的:对应于感知红色正方形的神经状态不能替代地编码绿色三角形的体验。意识的这一内在属性表明,意识表征必须以传统表征所不具备的方式无歧义。我们使用信息论形式化这一直觉,将表征歧义定义为给定表征R下可能解释I的条件熵H(I|R)。通过对训练分类MNIST数字的神经网络进行实验,我们证明了网络连接中的关系结构可以无歧义地编码表征内容。仅从关系结构出发,我们在识别输出神经元类别身份时,对dropout训练的网络达到完美(100%)准确率,对标准反向传播网络达到38%(随机水平:10%),尽管任务表现相同,这表明表征歧义可以独立于行为准确率而出现。我们进一步证明,输入神经元的空间位置(与视觉场位置等现象属性相关)可以从网络连接中解码,R^2高达0.844。这些结果为测量神经系统的表征歧义提供了定量方法,并证明神经网络可以展现出理论(如狭义表征主义和IIT)所认为的必要(尽管不充分)的低歧义表征。

英文摘要

Representations pervade our daily experience, from letters representing sounds to bit strings encoding digital files. While such representations require externally defined decoders to convey meaning, conscious experience is fundamentally different: a neural state corresponding to perceiving a red square cannot alternatively encode the experience of a green triangle. This intrinsic property of consciousness suggests that conscious representations must be unambiguous in a way that conventional representations are not. We formalize this intuition using information theory, defining representational ambiguity as the conditional entropy H(I|R) over possible interpretations I given a representation R. Through experiments on neural networks trained to classify MNIST digits, we demonstrate that relational structures in network connectivity can unambiguously encode representational content. From relational structure alone, we achieve perfect (100%) accuracy for dropout-trained networks and 38% for standard backpropagation (chance: 10%) in identifying output neuron class identity, despite identical task performance, demonstrating that representational ambiguity can arise orthogonally to behavioral accuracy. We further show that spatial position of input neurons, relevant to phenomenal properties like visual field location, can be decoded from network connectivity with R^2 up to 0.844. These results provide a quantitative method for measuring representational ambiguity in neural systems and demonstrate that neural networks can exhibit the low-ambiguity representations posited as necessary (though not sufficient) by theoretical accounts such as narrow representationalism and IIT.

2507.20975 2026-06-09 stat.ML cs.LG 版本更新

Locally Adaptive Conformal Inference for Operator Models

算子模型的局部自适应共形推断

Trevor Harris, Yan Liu

发表机构 * University of Connecticut(康涅狄格大学) Meta Platforms Inc(Meta平台公司)

AI总结 提出局部切片共形推断(LSCI),一种无分布框架,为算子模型生成函数值、局部自适应预测集,在合成和实际任务中比共形基线更紧、适应性更强。

Comments 12 pages, 3 figures, 2 tables, Preprint

详情
AI中文摘要

算子模型是函数巴拿赫空间之间的回归算法。它们已成为时空预测和物理模拟中日益关键的工具,尤其是在需要稳健、校准的不确定性量化的高风险场景中。我们引入了局部切片共形推断(LSCI),这是一种无分布框架,用于为算子模型生成函数值、局部自适应的预测集。我们证明了有限样本有效性,并在局部可交换性下推导了覆盖差距的数据相关上界。在合成高斯过程任务和实际应用(空气质量监测、能源需求预测和天气预报)中,与共形基线相比,LSCI 产生了更紧且适应性更强的集合。我们还实验证明了其对有偏预测和某些分布外噪声模式的鲁棒性。

英文摘要

Operator models are regression algorithms between Banach spaces of functions. They have become an increasingly critical tool for spatiotemporal forecasting and physics emulation, especially in high-stakes scenarios where robust, calibrated uncertainty quantification is required. We introduce Local Sliced Conformal Inference (LSCI), a distribution-free framework for generating function-valued, locally adaptive prediction sets for operator models. We prove finite-sample validity and derive a data-dependent upper bound on the coverage gap under local exchangeability. On synthetic Gaussian-process tasks and real applications (air quality monitoring, energy demand forecasting, and weather prediction), LSCI yields tighter sets with stronger adaptivity compared to conformal baselines. We also empirically demonstrate robustness against biased predictions and certain out-of-distribution noise regimes.

2510.04593 2026-06-09 eess.AS cs.SD 版本更新

UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models

UniVoice: 统一自回归ASR与基于流匹配的TTS的大语言模型框架

Wenhao Guan, Zhikang Niu, Ziyue Jiang, Kaidi Wang, Peijie Chen, Qingyang Hong, Lin Li, Xie Chen

发表机构 * Xiamen University, China(厦门大学) Shanghai Innovation Institute, China(上海创新研究院) Shanghai Jiao Tong University, China(上海交通大学) Zhejiang University, China(浙江大学)

AI总结 提出UniVoice,通过连续表示统一语音识别与合成,结合自回归建模和流匹配,设计双重注意力机制解决模态差异,实现高质量零样本语音克隆。

Comments accepted at interspeech2026

详情
AI中文摘要

大语言模型在自动语音识别和文本转语音系统中展现出有前景的性能,逐渐成为主流方法。然而,当前大多数方法分别处理这两个任务,而非通过统一框架。本工作旨在将这两个任务集成到一个统一模型中。尽管离散语音标记化能够实现联合建模,但其固有的信息损失限制了识别和生成的性能。在本工作中,我们提出了UniVoice,一个通过连续表示的统一大语言模型框架,无缝地将语音识别和合成集成在单个模型中。我们的方法结合了自回归建模在语音识别中的优势与流匹配在高品质生成中的优势。为了缓解自回归模型和流匹配模型之间的固有差异,我们进一步设计了一种双重注意力机制,在因果掩码(用于识别)和双向注意力掩码(用于合成)之间切换。此外,所提出的文本前缀条件语音填充方法实现了高保真度的零样本语音克隆。实验结果表明,我们的方法在ASR和零样本TTS任务中能够达到或超越当前单任务建模方法。本工作探索了端到端语音理解和生成的新可能性。代码可在该 https URL 获取。

英文摘要

Large language models (LLMs) have demonstrated promising performance in both automatic speech recognition (ASR) and text-to-speech (TTS) systems, gradually becoming the mainstream approach. However, most current approaches address these tasks separately rather than through a unified framework. This work aims to integrate these two tasks into one unified model. Although discrete speech tokenization enables joint modeling, its inherent information loss limits performance in both recognition and generation. In this work, we present UniVoice, a unified LLM framework through continuous representations that seamlessly integrates speech recognition and synthesis within a single model. Our approach combines the strengths of autoregressive modeling for speech recognition with flow matching for high-quality generation. To mitigate the inherent divergence between autoregressive and flow-matching models, we further design a dual attention mechanism, which switches between a causal mask for recognition and a bidirectional attention mask for synthesis. Furthermore, the proposed text-prefix-conditioned speech infilling method enables high-fidelity zero-shot voice cloning. Experimental results demonstrate that our method can achieve or exceed current single-task modeling methods in both ASR and zero-shot TTS tasks. This work explores new possibilities for end-to-end speech understanding and generation. Code is available at https://github.com/gwh22/UniVoice.

2511.00934 2026-06-09 cs.LO cs.RO 版本更新

pacSTL: PAC-Bounded Signal Temporal Logic from Data-Driven Reachability Analysis

pacSTL: 基于数据驱动可达性分析的PAC有界信号时序逻辑

Hanna Krasowski, Elizabeth Dietrich, Emir Cem Gezer, Roger Skjetne, Asgeir Johan Sørensen, Murat Arcak

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出pacSTL框架,结合PAC有界可达集预测与区间STL,通过优化问题计算原子鲁棒性上下界并传播,实现规范级别的PAC有界鲁棒性评估,用于不确定动态系统的验证与监控。

详情
AI中文摘要

信号时序逻辑(STL)是一种用于从连续信号中指定动态系统行为的表达性语言。然而,标准STL的一个局限性是其固有的确定性语义,这使其无法处理不确定性。现有克服这一局限的方法计算成本高,且限制了实时能力,需要在原子命题或规范改变时重复轨迹采样或重新设计原子命题上的概率分布。我们引入了pacSTL,一个将可能近似正确(PAC)有界可达集预测与STL的区间扩展相结合的框架。pacSTL通过求解PAC有界可达集上的优化问题来计算原子鲁棒性值的下界和上界,并通过时序逻辑算子传播这些界。得到的评估在规范级别产生一个PAC有界鲁棒性区间。我们通过验证四旋翼飞行场景和运行时监控海上导航规范来展示pacSTL的效率和相关性。

英文摘要

Signal Temporal Logic (STL) is an expressive language for specifying behaviors of dynamical systems from continuous signals. However, a limitation of standard STL is its inherently deterministic semantics, which prevents it from accommodating uncertainty. Existing approaches to overcome this limitation are computationally costly and limit real-time capability, requiring repeated trajectory sampling or the redesign of probability distributions over atomic propositions whenever the atomic propositions or specifications change. We introduce pacSTL, a framework that combines Probably Approximately Correct (PAC)-bounded reachable set predictions with an interval extension of STL. pacSTL computes lower and upper bounds on atomic robustness values by solving optimization problems over PAC-bounded reachable sets and propagates the bounds through the temporal logic operators. The resulting evaluation yields a PAC-bounded robustness interval at the specification level. We demonstrate the efficiency and relevance of pacSTL by verifying a quadrotor flight scenario and runtime monitoring a maritime navigation specification.

2504.05349 2026-06-09 stat.ML cs.AI cs.LG 版本更新

Hyperflux: Pruning Reveals Importance

Hyperflux: 剪枝揭示重要性

Eugen Barbulescu, Antonio Alexoaie, Lucian Busoniu

发表机构 * Department of Computer Science(计算机科学系) Technical University of Cluj-Napoca(克莱津-纳波卡技术大学) Department of Automation(自动化系)

AI总结 提出Hyperflux方法,通过将剪枝建模为连续演化系统(通量和压力),在微观和宏观层面解释剪枝行为,并引入压力调度器实现目标稀疏度,在多个数据集上取得竞争性结果。

详情
AI中文摘要

网络剪枝用于减少大型神经网络的推理延迟和功耗。然而,大多数方法侧重于经验结果,而牺牲了对剪枝过程的理解。我们引入Hyperflux,一种新颖的$L_0$方法,将剪枝建模为由通量(权重移除的梯度响应)和压力(驱动权重向剪枝发展的全局正则化)决定的连续演化系统。通过利用该模型,Hyperflux的剪枝行为在微观(权重再生/剪枝)和宏观(稀疏性收敛等)层面都变得可理解。我们还引入了一种新颖的压力调度器,可靠地针对目标稀疏度。Hyperflux在CIFAR-10、CIFAR-100和ImageNet数据集上使用ResNet-50、VGG-19和DeiT-T/S取得了竞争性结果。

英文摘要

Network pruning is used to reduce inference latency and power consumption in large neural networks. However, most methods focus on empirical results at the expense of understanding the pruning process. We introduce Hyperflux, a novel $L_0$ method which models pruning as a continuously evolving system determined by flux, the gradient response to a weight's removal, and pressure, a global regularization driving weights toward pruning. By exploiting this model, Hyperflux's pruning behavior becomes understandable at both microscopic (weight regrowth/pruning) and macroscopic (sparsity convergence, etc.) levels. We also introduce a novel pressure scheduler that reliably targets desired sparsities. Hyperflux achieves competitive results with ResNet-50, VGG-19 and DeiT-T/S on CIFAR-10, CIFAR-100 and ImageNet datasets.

2510.17947 2026-06-09 cs.CR cs.AI cs.CL cs.LG cs.MA 版本更新

PLAGUE: Plug-and-play framework for Lifelong Adaptive Generation of Multi-turn Exploits

PLAGUE:面向多轮利用的终身自适应生成的即插即用框架

Neeladri Bhuiya, Madhav Aggarwal, Diptanshu Purwar

发表机构 * A10 Networks, Inc.(A10网络公司) University of Massachusetts Amherst(马萨诸塞大学阿姆赫斯特分校)

AI总结 提出PLAGUE框架,通过终身学习启发的三阶段设计(Primer、Planner、Finisher)实现高效多轮越狱攻击,在o3和Opus 4.1等强安全模型上ASR提升超30%。

Comments Accepted in ICLR 2026

详情
AI中文摘要

大型语言模型(LLMs)正以惊人的速度改进。随着智能体工作流的出现,多轮对话已成为与LLMs交互以完成长而复杂任务的事实标准。尽管LLM能力持续提升,但它们仍然越来越容易受到越狱攻击,尤其是在多轮场景中,有害意图可以巧妙地注入到对话中,产生恶意结果。虽然单轮攻击已被广泛探索,但适应性、效率和有效性仍然是多轮攻击面临的关键挑战。为了解决这些不足,我们提出了PLAGUE,一种新颖的即插即用框架,用于设计受终身学习智能体启发的多轮攻击。PLAGUE将多轮攻击的生命周期分解为三个精心设计的阶段(Primer、Planner和Finisher),从而实现对多轮攻击家族的系统性和信息丰富的探索。评估表明,使用PLAGUE设计的红队智能体实现了最先进的越狱结果,在更少或相当的查询预算下,领先模型的攻击成功率(ASR)提高了30%以上。特别是,PLAGUE在OpenAI的o3上实现了81.4%的ASR(基于StrongReject),在Claude的Opus 4.1上实现了67.3%的ASR,这两个模型在安全文献中被认为对越狱具有高度抵抗力。我们的工作提供了工具和见解,以理解计划初始化、上下文优化和终身学习在构建多轮攻击以进行全面模型脆弱性评估中的重要性。

英文摘要

Large Language Models (LLMs) are improving at an exceptional rate. With the advent of agentic workflows, multi-turn dialogue has become the de facto mode of interaction with LLMs for completing long and complex tasks. While LLM capabilities continue to improve, they remain increasingly susceptible to jailbreaking, especially in multi-turn scenarios where harmful intent can be subtly injected across the conversation to produce nefarious outcomes. While single-turn attacks have been extensively explored, adaptability, efficiency and effectiveness continue to remain key challenges for their multi-turn counterparts. To address these gaps, we present PLAGUE, a novel plug-and-play framework for designing multi-turn attacks inspired by lifelong-learning agents. PLAGUE dissects the lifetime of a multi-turn attack into three carefully designed phases (Primer, Planner and Finisher) that enable a systematic and information-rich exploration of the multi-turn attack family. Evaluations show that red-teaming agents designed using PLAGUE achieve state-of-the-art jailbreaking results, improving attack success rates (ASR) by more than 30% across leading models in a lesser or comparable query budget. Particularly, PLAGUE enables an ASR (based on StrongReject) of 81.4% on OpenAI's o3 and 67.3% on Claude's Opus 4.1, two models that are considered highly resistant to jailbreaks in safety literature. Our work offers tools and insights to understand the importance of plan initialization, context optimization and lifelong learning in crafting multi-turn attacks for a comprehensive model vulnerability evaluation.

2505.08908 2026-06-09 math.ST cs.LG econ.TH stat.TH 版本更新

Statistical Decision Theory with Counterfactual Loss

具有反事实损失的统计决策理论

Benedikt Koch, Kosuke Imai

发表机构 * Harvard University(哈佛大学)

AI总结 针对经典统计决策理论忽略反事实信息的问题,提出在强可忽略性下反事实风险可识别当且仅当损失函数在潜在结果上可加,并证明可加反事实损失能捕捉决策难度,通过符号线性逆规划无需数据即可判断可识别性。

详情
AI中文摘要

许多研究者应用经典统计决策理论来评估治疗选择和学习最优策略。然而,由于该框架仅依赖于所选行动下的实现结果而忽略反事实,它无法在单位层面评估决策相对于可行替代方案的质量,而这在某些设置中是一个重要要求。例如,在审前保释决策中,法官必须平衡释放后的犯罪预防与对被捕者施加不必要负担的风险。该框架中的一个核心挑战是可识别性:由于每个单位仅观测到一个潜在结果,反事实风险通常不可识别。我们证明,在强可忽略性下,反事实风险可识别当且仅当损失函数在潜在结果上可加。我们进一步证明,当存在两个以上的治疗选项时,可加反事实损失可以产生与基于标准损失不同的治疗推荐。我们表明,可加反事实损失不仅捕捉决策准确性,还捕捉决策难度,而标准损失仅反映准确性。最后,我们引入一个符号线性逆规划,无需数据即可确定给定的反事实损失是否产生可识别的风险。

英文摘要

Many researchers apply classical statistical decision theory to evaluate treatment choices and learn optimal policies. However, because this framework relies solely on realized outcomes under chosen actions and ignores counterfactuals, it cannot assess the quality of a decision relative to feasible alternatives at the unit level, which is an important requirement in some settings. For example, in pretrial bail decisions, a judge must balance crime prevention upon release against the risk of imposing unnecessary burdens on arrestees. A central challenge in this framework is identification: since only one potential outcome is observed per unit, counterfactual risk is typically not identifiable. We show that, under strong ignorability, counterfactual risk is identifiable if and only if the loss is additive in the potential outcomes. We further demonstrate that additive counterfactual losses can yield treatment recommendations that differ from those based on standard losses when more than two treatment options are available. We show that additive counterfactual losses capture not only decision accuracy but also decision difficulty, whereas standard losses reflect accuracy alone. Finally, we introduce a symbolic linear inverse program that determines whether a given counterfactual loss yields an identifiable risk, without requiring data.

2510.12744 2026-06-09 stat.ML cs.LG math.ST stat.CO stat.ME stat.TH 版本更新

Dendrograms of Mixing Measures for Softmax-Gated Gaussian Mixture of Experts: Consistency Without Model Sweeps

混合测度的树状图用于Softmax门控高斯混合专家:无需模型扫描的一致性

Do Tien Hai, Trung Nguyen Mai, TrungTin Nguyen, Nhat Ho, Binh T. Nguyen, Christopher Drovandi

发表机构 * Faculty of Mathematics and Computer Science, University of Science, Ho Chi Minh City, Vietnam(越南胡志明市科学大学数学与计算机科学学院) Vietnam National University Ho Chi Minh City, Vietnam(越南胡志明市国家大学) Faculty of Information Technology, University of Science, Ho Chi Minh City, Vietnam(越南胡志明市科学大学信息技术学院) ARC Centre of Excellence for the Mathematical Analysis of Cellular Systems(细胞系统数学分析 excellence 中心) School of Mathematical Sciences, Queensland University of Technology, Brisbane City, Australia(昆士兰科技大学数学科学学院) Department of Statistics and Data Science, University of Texas at Austin, Austin, USA(德克萨斯大学奥斯汀分校统计与数据科学系)

AI总结 针对softmax门控高斯混合专家模型,提出基于Voronoi损失函数的统一统计框架,解决参数非可识别性和模型选择问题,并引入混合测度树状图实现一致且无需多尺寸训练的专家数选择。

Comments Do Tien Hai, Trung Nguyen Mai, and TrungTin Nguyen are co-first authors. In Proceedings of The 29th International Conference on Artificial Intelligence and Statistics, AISTATS 2026 Spotlight, Acceptance rate 2.5% over 2102 submissions

详情
AI中文摘要

我们为softmax门控高斯混合专家(SGMoE)开发了一个统一的统计框架,解决了参数估计和模型选择中三个长期存在的障碍:(i)门控参数在公共平移下的非可识别性,(ii)内在的门控-专家交互导致似然中耦合的微分关系,以及(iii)softmax诱导的条件密度中紧密的分子-分母耦合。我们的方法引入了与门划分几何对齐的Voronoi型损失函数,并建立了最大似然估计(MLE)的有限样本收敛速率。在过指定模型中,我们揭示了MLE收敛速率与刻画接近非可识别方向的多项式方程组可解性之间的联系。对于模型选择,我们将混合测度的树状图适配到SGMoE,产生一个一致且无需扫描的专家数选择器,在过拟合下达到逐点最优的参数速率,同时避免多尺寸训练。在合成数据上的模拟验证了理论,准确恢复了专家数量并达到了参数估计的预测速率,同时紧密逼近回归函数。在模型误指定下(例如,$\epsilon$-污染),树状图选择准则具有鲁棒性,恢复了真实的混合成分数量,而Akaike信息准则、贝叶斯信息准则和集成完全似然在样本量增大时倾向于过选择。在一个干旱响应性状的玉米蛋白质组学数据集上,我们的树状图引导的SGMoE选择了两个专家,揭示了清晰的混合测度层次结构,早期稳定了似然,并产生了可解释的基因型-表型图谱,优于无需多尺寸训练的标准准则。

英文摘要

We develop a unified statistical framework for softmax-gated Gaussian mixture of experts (SGMoE) that addresses three long-standing obstacles in parameter estimation and model selection: (i) non-identifiability of gating parameters up to common translations, (ii) intrinsic gate-expert interactions that induce coupled differential relations in the likelihood, and (iii) the tight numerator-denominator coupling in the softmax-induced conditional density. Our approach introduces Voronoi-type loss functions aligned with the gate-partition geometry and establishes finite-sample convergence rates for the maximum likelihood estimator (MLE). In over-specified models, we reveal a link between the MLE's convergence rate and the solvability of an associated system of polynomial equations characterizing near-nonidentifiable directions. For model selection, we adapt dendrograms of mixing measures to SGMoE, yielding a consistent, sweep-free selector of the number of experts that attains pointwise-optimal parameter rates under overfitting while avoiding multi-size training. Simulations on synthetic data corroborate the theory, accurately recovering the expert count and achieving the predicted rates for parameter estimation while closely approximating the regression function. Under model misspecification (e.g., $ε$-contamination), the dendrogram selection criterion is robust, recovering the true number of mixture components, while the Akaike information criterion, the Bayesian information criterion, and the integrated completed likelihood tend to overselect as sample size grows. On a maize proteomics dataset of drought-responsive traits, our dendrogram-guided SGMoE selects two experts, exposes a clear mixing-measure hierarchy, stabilizes the likelihood early, and yields interpretable genotype-phenotype maps, outperforming standard criteria without multi-size training.

2503.22697 2026-06-09 q-bio.NC cs.AI cs.CV 版本更新

Brain2Text Decoding Model Reveals the Neural Mechanisms of Visual Semantic Processing

Brain2Text解码模型揭示视觉语义处理的神经机制

Feihan Feng, Jingxin Nie

发表机构 * Ministry of Education Center for Studies of Psychological Application(教育部心理应用研究中心) Center for Studies of Psychological Application(心理应用研究中心) Key Laboratory of Brain, Cognition and Education Sciences(脑认知与教育科学重点实验室) School of Psychology(心理学学院) Guangdong Key Laboratory of Mental Health and Cognitive Science(广东省心理健康与认知科学重点实验室)

AI总结 提出一种直接从fMRI信号解码自然图像语义描述的深度学习模型,揭示了高级视觉皮层在语义处理中的关键作用,并展示了类别特异性神经表征。

Comments 39 pages, 9 figures

详情
AI中文摘要

从神经活动解码感官体验以重建人类感知的视觉刺激和语义内容,仍然是神经科学和人工智能领域的挑战。尽管当前的脑解码模型取得了显著进展,但在与已建立的神经科学理论的系统整合以及探索潜在神经机制方面仍存在关键差距。在这里,我们提出了一种新颖的框架,直接将fMRI信号解码为所观看自然图像的文本描述。我们的新型深度学习模型在未使用视觉信息训练的情况下,实现了最先进的语义解码性能,生成了捕捉复杂场景核心语义内容的有意义描述。神经解剖学分析揭示了高级视觉皮层(包括MT+复合体、腹侧流视觉皮层和顶下小叶)在视觉语义处理中的关键作用。此外,类别特异性分析展示了语义维度(如生命度和运动)的细微神经表征。这项工作为大脑的语义解码提供了一个更直接和可解释的框架,为探究复杂语义处理的神经基础、完善对分布式语义网络的理解以及潜在开发脑启发语言模型提供了强大的新方法。

英文摘要

Decoding sensory experiences from neural activity to reconstruct human-perceived visual stimuli and semantic content remains a challenge in neuroscience and artificial intelligence. Despite notable progress in current brain decoding models, a critical gap still persists in their systematic integration with established neuroscientific theories and the exploration of underlying neural mechanisms. Here, we present a novel framework that directly decodes fMRI signals into textual descriptions of viewed natural images. Our novel deep learning model, trained without visual information, achieves state-of-the-art semantic decoding performance, generating meaningful captions that capture the core semantic content of complex scenes. Neuroanatomical analysis reveals the critical role of higher-level visual cortices, including MT+ complex, ventral stream visual cortex, and inferior parietal cortex, in visual semantic processing. Furthermore, category-specific analysis demonstrates nuanced neural representations for semantic dimensions like animacy and motion. This work provides a more direct and interpretable framework to the brain's semantic decoding, offering a powerful new methodology for probing the neural basis of complex semantic processing, refining the understanding of the distributed semantic network, and potentially developing brain-inspired language models.

2510.03389 2026-06-09 quant-ph cs.LG 版本更新

Quantum feature-map learning with reduced resource overhead

量子特征映射学习:降低资源开销

Jonas Jäger, Philipp Elsässer, Elham Torabian

发表机构 * Department of Computer Science and Institute of Applied Mathematics, University of British Columbia (UBC), Vancouver, B.C. V6T 1Z4, Canada(计算机科学系和应用数学研究所,不列颠哥伦比亚大学(UBC),温哥华,B.C. V6T 1Z4,加拿大) Stewart Blusson Quantum Matter Institute (QMI), Vancouver, B.C. V6T 1Z4, Canada(斯图尔特·布卢森量子物质研究所(QMI),温哥华,B.C. V6T 1Z4,加拿大) Institute of Physics, University of Freiburg, Freiburg (Breisgau), 79104, Germany(物理研究所,弗赖堡大学,弗赖堡(巴登-符腾堡),79104,德国) Department of Chemistry, University of British Columbia (UBC), Vancouver, B.C. V6T 1Z1, Canada(化学系,不列颠哥伦比亚大学(UBC),温哥华,B.C. V6T 1Z1,加拿大)

AI总结 提出Q-FLAIR算法,通过部分解析重构将工作负载转移到经典计算机,显著降低量子资源开销,在真实IBM设备上仅用4小时即在完整MNIST数据集上达到90%以上准确率。

Comments 24 pages, 12 figures, 2 tables

详情
Journal ref
Phys. Rev. Research 8(2), 023247 (2026)
AI中文摘要

当前的量子计算机需要算法经济地使用有限资源。在量子机器学习中,成功取决于量子特征映射,它将经典数据嵌入到量子比特的状态空间中。我们引入了通过解析迭代重构的量子特征映射学习(Q-FLAIR),这是一种在迭代特征映射电路构建中减少量子资源开销的算法。它通过部分解析重构量子模型,仅使用少量评估就将工作负载转移到经典计算机上。对于每次探测到的门添加到拟设中,数据特征和权重参数的同时选择和优化则完全在经典计算机上进行。集成到量子神经网络和量子核支持向量分类器中,Q-FLAIR展示了最先进的基准性能。由于资源开销与特征维度解耦,我们在真实的IBM设备上仅用四小时就训练了一个量子模型,在完整分辨率MNIST数据集(784个特征,数字3 vs 5)上达到了超过90%的准确率。这样的结果以前是无法实现的,因为特征维度会极大地增加固定拟设的硬件需求以及自适应拟设的搜索成本。此外,Q-FLAIR展示了针对直接经典建模的去量子化鲁棒性,满足了文献中罕见的基准,这是潜在量子优势的必要条件。通过超越黑盒优化重新思考特征映射学习,这项工作为在现实问题和近期量子计算机上实现量子机器学习迈出了具体的一步。

英文摘要

Current quantum computers require algorithms that use limited resources economically. In quantum machine learning, success hinges on quantum feature-maps, which embed classical data into the state space of qubits. We introduce Quantum Feature-Map Learning via Analytic Iterative Reconstructions (Q-FLAIR), an algorithm that reduces quantum resource overhead in iterative feature-map circuit construction. It shifts workloads to a classical computer via partial analytic reconstructions of the quantum model, using only a few evaluations. For each probed gate addition to the ansatz, the simultaneous selection and optimization of the data feature and weight parameter is then entirely classical. Integrated into quantum neural network and quantum kernel support vector classifiers, Q-FLAIR shows state-of-the-art benchmark performance. Since resource overhead decouples from feature dimension, we train a quantum model on a real IBM device in only four hours, surpassing 90% accuracy on the full-resolution MNIST dataset (784 features, digits 3 vs 5). Such results were previously unattainable, as the feature dimension prohibitively drives hardware demands for fixed and search costs for adaptive ansätze. Furthermore, Q-FLAIR demonstrates de-quantization robustness against direct classical modeling, satisfying a benchmark rare in the literature and a necessary condition for potential quantum advantage. By rethinking feature-map learning beyond black-box optimization, this work takes a concrete step toward enabling quantum machine learning for real-world problems and near-term quantum computers.

2502.15131 2026-06-09 math.ST cs.LG stat.ME stat.ML stat.TH 版本更新

Optimal and Provable Calibration in High-Dimensional Binary Classification: Angular Calibration and Platt Scaling

高维二分类中的最优且可证明的校准:角度校准与Platt缩放

Yufan Li, Pragya Sur

发表机构 * Harvard University(哈佛大学)

AI总结 针对高维高斯特征下的线性二分类器,提出基于估计权重与真实权重夹角的角度校准方法,证明其可校准且唯一Bregman最优,并揭示Platt缩放在高维下收敛于该最优解。

详情
AI中文摘要

我们研究校准形如 $\sigma(\hat{w}^\top x)$ 的线性二分类器的基本问题,其中特征向量 $x$ 服从高斯分布,$\sigma$ 是链接函数,$\hat{w}$ 是真实线性权重 $w^\star$ 的估计量。通过与非信息性的 $\textit{机会分类器}$ 插值,我们构建了一个良好校准的预测器,其插值权重取决于估计量 $\hat{w}$ 与真实线性权重 $w_\star$ 之间的夹角 $\angle(\hat{w}, w_\star)$。我们证明,在样本量和特征量均以可比速率发散的高维机制下,这种角度校准方法可证明是良好校准的。夹角 $\angle(\hat{w}, w_\star)$ 可以一致地估计。此外,所得预测器是唯一 $\textit{Bregman最优}$ 的,即在合适的校准预测器类中最小化与真实标签分布的Bregman散度。我们的工作是首个在高维下同时满足校准和最优性可证明的校准策略。此外,我们识别了经典Platt缩放预测器收敛到我们的Bregman最优校准解的条件。因此,Platt缩放在高维下也继承了这些理想性质。

英文摘要

We study the fundamental problem of calibrating a linear binary classifier of the form $σ(\hat{w}^\top x)$, where the feature vector $x$ is Gaussian, $σ$ is a link function, and $\hat{w}$ is an estimator of the true linear weight $w^\star$. By interpolating with a noninformative $\textit{chance classifier}$, we construct a well-calibrated predictor whose interpolation weight depends on the angle $\angle(\hat{w}, w_\star)$ between the estimator $\hat{w}$ and the true linear weight $w_\star$. We establish that this angular calibration approach is provably well-calibrated in a high-dimensional regime where the number of samples and features both diverge, at a comparable rate. The angle $\angle(\hat{w}, w_\star)$ can be consistently estimated. Furthermore, the resulting predictor is uniquely $\textit{Bregman-optimal}$, minimizing the Bregman divergence to the true label distribution within a suitable class of calibrated predictors. Our work is the first to provide a calibration strategy that satisfies both calibration and optimality properties provably in high dimensions. Additionally, we identify conditions under which a classical Platt-scaling predictor converges to our Bregman-optimal calibrated solution. Thus, Platt-scaling also inherits these desirable properties provably in high dimensions.

2509.20714 2026-06-09 cs.CR cs.LG 版本更新

Cryptographic Backdoor for Neural Networks: Boon and Bane

神经网络的密码学后门:福与祸

Anh Tu Ngo, Anupam Chattopadhyay, Subhamoy Maitra

发表机构 * College of Computing and Data Science, Nanyang Technological University Singapore(南洋理工大学计算与数据科学学院) Applied Statistics Unit, Indian Statistical Institute(印度统计研究所应用统计单位)

AI总结 本文展示密码学后门在神经网络中的双重作用:既可发动强大隐形攻击,也可用于鲁棒水印、用户认证和知识产权追踪,并证明这些协议在标准假设下是鲁棒的。

Comments Preprint

详情
AI中文摘要

在本文中,我们展示了神经网络中的密码学后门在两个方向上可以非常有效,即发起攻击以及提供防御。在攻击方面,精心植入的密码学后门能够对神经网络发动强大且隐形的攻击。在防御方面,我们提出了应用:首先,一个可证明鲁棒的神经网络水印方案;其次,一个保证用户认证的协议;第三,一个追踪神经网络知识产权未授权共享的协议。从更广泛的理论视角来看,借鉴Goldwasser等人[FOCS 2022]的思想,我们的主要贡献是表明所有这些实例化的实际协议实现都是可证明鲁棒的。水印、认证和知识产权追踪协议能够抵抗对神经网络具有黑盒访问权限的对手,而基于后门的对抗攻击在标准假设下是无法阻止的。虽然我们攻击所使用的理论工具与Goldwasser等人的思路基本一致,但与防御相关的证明需要进一步研究。最后,所有这些协议都在最先进的神经网络架构上实现,实验结果证实了理论主张。此外,可以利用后量子原语来实现密码学后门,为机器学习中的量子时代应用奠定基础。

英文摘要

In this paper we show that cryptographic backdoors in a neural network (NN) can be highly effective in two directions, namely mounting the attacks as well as in presenting the defenses as well. On the attack side, a carefully planted cryptographic backdoor enables powerful and invisible attack on the NN. Considering the defense, we present applications: first, a provably robust NN watermarking scheme; second, a protocol for guaranteeing user authentication; and third, a protocol for tracking unauthorized sharing of the NN intellectual property (IP). From a broader theoretical perspective, borrowing the ideas from Goldwasser et. al. [FOCS 2022], our main contribution is to show that all these instantiated practical protocol implementations are provably robust. The protocols for watermarking, authentication and IP tracking resist an adversary with black-box access to the NN, whereas the backdoor-enabled adversarial attack is impossible to prevent under the standard assumptions. While the theoretical tools used for our attack is mostly in line with the Goldwasser et. al. ideas, the proofs related to the defense need further studies. Finally, all these protocols are implemented on state-of-the-art NN architectures with empirical results corroborating the theoretical claims. Further, one can utilize post-quantum primitives for implementing the cryptographic backdoors, laying out foundations for quantum-era applications in machine learning (ML).

2508.10239 2026-06-09 cs.HC cs.CL 版本更新

Breaking the Curse of Knowledge: Designing Personalized Jargon Support for Real-Time Online Meetings

打破知识的诅咒:为实时在线会议设计个性化术语支持

Yifan Song, Yijun Liu, Wing Yee Au, Hon Yung Wong, Brian P. Bailey, Tal August

发表机构 * University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校) Fujitsu Research of America(富士通美国研究)

AI总结 提出ParseJargon系统,利用用户画像和会话内反馈实现个性化术语识别,提升在线会议中跨学科听众的理解和参与度。

Comments Portions of this work appeared in CHI '26 Extended Abstracts ("Breaking the Curse of Knowledge: Toward Personalized Jargon Support in Online Meetings") and ACL '26 System Demonstrations ("ParseJargon: Personalized Real-time Jargon Support in Online Meetings")

详情
AI中文摘要

跨学科交流常常受到专业语言(即术语)和不均衡背景知识的阻碍。语音转文本和大语言模型的最新进展使得在在线会议期间提供术语支持成为可能,但通用支持(即对每个人定义相同的术语)可能会用听众不需要的定义淹没他们。我们提出了ParseJargon,一个用于实时在线会议中个性化术语支持的系统。我们从一个初始原型开始,探索使用单句用户画像进行个性化。我们进行了一项对照研究,结果表明,与通用支持相比,即使这种最小程度的个性化也能增强听众的理解和参与度,因为术语识别更精确。根据参与者反馈的见解,我们改进了系统,采用了更先进的个性化技术,包括会话内用户反馈和基于便携式词汇表的画像。我们评估了这些技术如何进一步提高术语识别精度,使用对照研究中收集的数据来模拟随时间变化的个性化。我们还进行了延迟测试,并辅以轻量级部署,以分析系统的实时能力和可用性。

英文摘要

Cross-disciplinary communication is often hindered by specialized language (i.e., jargon) and uneven background knowledge. Recent advances in speech-to-text and large language models make it possible to provide jargon support during online meetings, but generic support (i.e., defining the same terms for everyone) can overwhelm listeners with definitions they do not need. We present ParseJargon, a system for personalized jargon support in real-time online meetings. We begin with an initial prototype to probe the use of single-sentence user profiles for personalization. We conducted a controlled study and showed that even this minimal personalization enhanced listeners' comprehension and engagement over generic support because of more precise jargon identification. Guided by insights from participants' feedback, we refined the system with more advanced personalization techniques, including in-session user feedback and portable glossary-based profiles. We evaluated how these techniques can further improve jargon identification precision using data collected in the controlled study to simulate personalization over time. We also conducted a latency test, complemented by a lightweight deployment, to analyze the system's real-time capability and usability.

2509.07779 2026-06-09 math.OC cs.LG cs.MA 版本更新

Decentralized Online Riemannian Optimization Beyond Hadamard Manifolds

超越哈达玛流形的去中心化在线黎曼优化

Emre Sahinoglu, Shahin Shahrampour

发表机构 * Department of Mechanical & Industrial Engineering at Northeastern University(东北大学机械与工业工程系)

AI总结 针对可能具有正曲率的流形,提出曲率感知的黎曼共识步骤,实现去中心化在线黎曼梯度下降算法,并证明O(√T)遗憾界。

详情
AI中文摘要

我们研究在可能具有正曲率的流形上的去中心化在线黎曼优化,超越了哈达玛流形设定。去中心化优化技术依赖于共识步骤,该步骤在欧几里得空间中因其线性性质而被充分理解。然而,在正曲率黎曼空间中,一个主要的技术挑战是测地距离可能不诱导全局凸结构。在这项工作中,我们首先分析了一个曲率感知的黎曼共识步骤,该步骤使得在哈达玛流形之外也能实现线性收敛。基于此步骤,我们为去中心化在线黎曼梯度下降算法建立了$O(\sqrt{T})$遗憾界。然后,我们研究了双点bandit反馈设置,其中我们使用平滑技术采用计算高效的梯度估计器,并通过平滑目标的次凸性分析证明了相同的$O(\sqrt{T})$遗憾界。

英文摘要

We study decentralized online Riemannian optimization over manifolds with possibly positive curvature, going beyond the Hadamard manifold setting. Decentralized optimization techniques rely on a consensus step that is well understood in Euclidean spaces because of their linearity. However, in positively curved Riemannian spaces, a main technical challenge is that geodesic distances may not induce a globally convex structure. In this work, we first analyze a curvature-aware Riemannian consensus step that enables a linear convergence beyond Hadamard manifolds. Building on this step, we establish a $O(\sqrt{T})$ regret bound for the decentralized online Riemannian gradient descent algorithm. Then, we investigate the two-point bandit feedback setup, where we employ computationally efficient gradient estimators using smoothing techniques, and we demonstrate the same $O(\sqrt{T})$ regret bound through the subconvexity analysis of smoothed objectives.

2508.11874 2026-06-09 cs.GT cs.AI cs.DS cs.LO cs.PL 版本更新

Discovering Expert-Level Nash Equilibrium Algorithms with Large Language Models

利用大型语言模型发现专家级纳什均衡算法

Hanyu Li, Dongchen Li, Xiaotie Deng

发表机构 * CFCS, School of Computer Science, Peking University, Beijing, China(计算机科学系,北京大学,北京,中国) School of Computing and Data Science, The University of Hong Kong, Pokfulam, Hong Kong(计算与数据科学学院,香港大学,薄扶林,香港)

AI总结 提出LegoNE框架,将专家证明策略编码为符号语言,自动验证算法的最坏情况保证,结合推理型LLM重新发现并改进了多人博弈的近似纳什均衡算法。

Comments accepted by Nature Communications

详情
AI中文摘要

设计具有可证明最坏情况保证的近似纳什均衡(ANE)的多项式时间算法是算法博弈论中的一个基本开放问题。虽然大型语言模型(LLM)可以大规模生成候选算法,但验证最坏情况保证需要对所有博弈实例进行形式化分析——此前没有自动化系统能够完成这项任务。在这里,我们提出了LegoNE,一个将专家证明策略编码为符号语言的框架,该框架自动将任何候选算法编译成一个有限优化问题,以验证其最坏情况保证。将LegoNE与一个推理型LLM集成,我们重新发现了一个匹配双人博弈最佳多项式时间保证的算法,并发现了一个三人博弈算法,将最佳保证从$0.6+\delta$改进到$0.5+\delta$——这被证明超出了扩展技术(唯一已知的多玩家ANE设计范式)的能力范围。这些结果表明,将特定领域的证明策略编码为机器可处理的语言可以支持LLM驱动的算法发现,超越已知的人类设计范式。

英文摘要

Designing polynomial-time algorithms for approximate Nash equilibria (ANE) with provable worst-case guarantees is a fundamental open problem in algorithmic game theory. While large language models (LLMs) can generate candidate algorithms at scale, certifying worst-case guarantees requires formal analysis over all game instances -- a task for which no automated system previously existed. Here, we present LegoNE, a framework encoding expert proof strategies into a symbolic language that automatically compiles any candidate algorithm into a finite optimization problem certifying its worst-case guarantee. Integrating LegoNE with a reasoning LLM, we rediscovered an algorithm matching the best polynomial-time guarantee for two-player games, and discovered a three-player algorithm improving the best guarantee from $0.6+δ$ to $0.5+δ$ -- provably beyond the reach of the extension technique, the only previously known multi-player ANE design paradigm. These results show that encoding domain-specific proof strategies into a machine-tractable language can support LLM-driven discovery of algorithms outside known human design paradigms.

2501.15509 2026-06-09 cs.CR cs.AI cs.LG 版本更新

FIT-Print: Towards False-claim-resistant Model Ownership Verification via Targeted Fingerprint

FIT-Print:通过目标指纹实现抗虚假声明的模型所有权验证

Shuo Shao, Haozhe Zhu, Yiming Li, Hongwei Yao, Tianwei Zhang, Zhan Qin

发表机构 * State Key Laboratory of Blockchain and Data Security, Zhejiang University(区块链与数据安全国家重点实验室,浙江大学) Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security, Hangzhou(杭州高新技术区(滨江)区块链与数据安全研究院,杭州) College of Computing and Data Science, Nanyang Technological University(南洋理工大学计算机与数据科学学院) Department of Computer Science, City University of Hong Kong(香港城市大学计算机科学系)

AI总结 针对现有模型指纹易受虚假声明攻击的问题,提出目标指纹范式FIT-Print,通过优化将指纹转化为可验证目标签名,并设计两种黑盒方法,实现100%防御成功率和0%误报率。

Comments This paper has been accepted by IEEE Transactions on Information Forensics and Security

详情
AI中文摘要

模型指纹已成为保护开源模型知识产权的重要机制,提供了一种无需修改受保护模型的非侵入式方法。然而,我们的分析表明,现有指纹技术从根本上容易受到虚假声明攻击,即对手可以欺诈性地声称对独立的第三方模型拥有所有权。我们证明,这种脆弱性源于当前方法的非目标性,它们基于任意样本输出而非与特定预定义参考的对齐来评估模型相似性。为缓解此漏洞,我们引入了FIT-Print,一种主动对抗虚假声明攻击的目标指纹范式。具体来说,FIT-Print利用优化将指纹转化为可验证的目标签名。在此基础之上,我们提出了两种黑盒指纹方法:逐位的FIT-ModelDiff和逐列表的FIT-LIME,它们分别利用输出距离和特征归因作为鲁棒的模型签名。在基准模型和数据集上的广泛评估表明,我们的框架完美地中和了虚假声明攻击(100%防御成功率),消除了对独立模型的误报(0.0%),同时针对各种模型复用技术保持了100%的所有权验证率。

英文摘要

Model fingerprinting has emerged as a crucial mechanism for safeguarding the intellectual property of open-source models, offering a non-intrusive approach that requires no modifications to the protected model. However, our analysis reveals that existing fingerprinting techniques are fundamentally vulnerable to false claim attacks, wherein adversaries can fraudulently assert ownership over independent third-party models. We demonstrate that this vulnerability stems from the untargeted nature of current methods, which evaluate model similarity based on arbitrary sample outputs rather than alignment with a specific, predefined reference. To mitigate this vulnerability, we introduce FIT-Print, a targeted fingerprinting paradigm that actively counters false claim attacks. Specifically, FIT-Print leverages optimization to transform the fingerprint into a verifiable, targeted signature. Building upon this foundation, we propose two black-box fingerprinting methods, the bit-wise FIT-ModelDiff and the list-wise FIT-LIME, which utilize output distances and feature attributions as robust model signatures, respectively. Extensive evaluations across benchmark models and datasets show that our framework perfectly neutralizes false claim attacks (100% defense success rate) and eliminates false alarms on independent models (0.0%), all while maintaining a 100% ownership verification rate against diverse model reuse techniques.

2507.08920 2026-06-09 q-bio.BM cs.AI 版本更新

AMix-1: A Pathway to Test-Time Scalable Protein Foundation Model

AMix-1: 迈向测试时可扩展的蛋白质基础模型

Changze Lv, Jiang Zhou, Siyu Long, Lihao Wang, Jiangtao Feng, Dongyu Xue, Yu Pei, Hao Wang, Zherui Zhang, Yuchen Cai, Zhiqiang Gao, Ziyuan Ma, Jiakai Hu, Chaochen Gao, Jingjing Gong, Yuxuan Song, Shuyi Zhang, Xiaoqing Zheng, Deyi Xiong, Lei Bai, Wanli Ouyang, Ya-Qin Zhang, Wei-Ying Ma, Bowen Zhou, Hao Zhou

发表机构 * Shanghai Artificial Intelligence Laboratory(上海人工智能实验室) Generative Symbolic Intelligence Lab (GenSI), Tsinghua University(生成符号智能实验室(GenSI),清华大学) Institute for AI Industry Research (AIR), Tsinghua University(人工智能产业研究院(AIR),清华大学) Tsinghua University(清华大学) Fudan University(复旦大学) Tianjin University(天津大学) Georgia Institute of Technology(佐治亚理工学院) Beijing University of Posts and Telecommunications(北京邮电大学) University of Chinese Academy of Sciences(中国科学院大学) City University of Hong Kong(香港城市大学)

AI总结 提出基于贝叶斯流网络的蛋白质基础模型AMix-1,通过预训练缩放律、涌现能力分析、上下文学习机制和测试时缩放算法,实现1.7B参数模型,并设计出活性提高50倍的AmeR变体。

详情
AI中文摘要

我们介绍了AMix-1,一个强大的蛋白质基础模型,它基于贝叶斯流网络构建,并通过系统性的训练方法学增强,包括预训练缩放律、涌现能力分析、上下文学习机制和测试时缩放算法。为了保证稳健的可扩展性,我们建立了一个预测性缩放律,并通过损失视角揭示了结构理解的渐进涌现,最终得到了一个强大的17亿参数模型。在此基础上,我们设计了一种基于多序列比对(MSA)的上下文学习策略,将蛋白质设计统一到一个通用框架中,其中AMix-1识别MSA中的深层进化信号,并一致地生成结构和功能上连贯的蛋白质。该框架成功设计了一个显著改进的AmeR变体,其活性比野生型提高了高达50倍。为了突破蛋白质工程的边界,我们进一步为AMix-1配备了一种进化测试时缩放算法,用于计算机模拟定向进化,随着验证预算的增加,该算法提供了显著且可扩展的性能提升,为下一代实验室在环蛋白质设计奠定了基础。

英文摘要

We introduce AMix-1, a powerful protein foundation model built on Bayesian Flow Networks and empowered by a systematic training methodology, encompassing pretraining scaling laws, emergent capability analysis, in-context learning mechanism, and test-time scaling algorithm. To guarantee robust scalability, we establish a predictive scaling law and reveal the progressive emergence of structural understanding via loss perspective, culminating in a strong 1.7-billion model. Building on this foundation, we devise a multiple sequence alignment (MSA)-based in-context learning strategy to unify protein design into a general framework, where AMix-1 recognizes deep evolutionary signals among MSAs and consistently generates structurally and functionally coherent proteins. This framework enables the successful design of a dramatically improved AmeR variant with an up to $50\times$ activity increase over its wild type. Pushing the boundaries of protein engineering, we further empower AMix-1 with an evolutionary test-time scaling algorithm for in silico directed evolution that delivers substantial, scalable performance gains as verification budgets are intensified, laying the groundwork for next-generation lab-in-the-loop protein design.

2508.00724 2026-06-09 eess.SY cs.RO cs.SY 版本更新

Petri Net Modeling and Deadlock-Free Scheduling of Attachable Heterogeneous AGV Systems

可连接异构AGV系统的Petri网建模与无死锁调度

Boyu Li, Zhengchen Li, Weimin Wu, Mengchu Zhou

发表机构 * State Key Laboratory of Industrial Control Technology, Zhejiang University(浙江大学工业控制技术状态重点实验室) School of Information and Electronic Engineering, Zhejiang Gongshang University(浙江工商大学信息电子工程学院) Department of Electrical and Computer Engineering, New Jersey Institute of Technology(新 jersey 理工学院电子与计算机工程系)

AI总结 针对可连接异构AGV系统的调度问题,提出基于Petri网的无死锁调度框架,集成自适应大邻域搜索算法,通过结构分析预防死锁,实验表明该方法显著提升计算效率并优于现有策略。

Comments This work has been submitted to the IEEE for possible publication

详情
AI中文摘要

对柔性自动化的日益增长的需求加速了异构自动导引车(AGV)的采用。本文研究了一个由可连接异构AGV(包括载体和穿梭车)组成的物料运输系统中的新调度问题,这些AGV可灵活连接和分离以协同执行任务。虽然这种协作提高了操作效率,但连接引起的同步使系统高度耦合且容易发生死锁。为此,我们提出了一种基于Petri网(PN)的无死锁调度框架,并将其集成到自适应大邻域搜索(ALNS)算法中。引入PN将候选解从静态排列映射为动态协作过程,从而通过状态演化进行性能评估,并通过结构分析实现主动死锁预防。在真实和合成实例上的大量实验表明,所提出的框架显著提高了计算效率,开发的ALNS优于当前现场策略、精确求解器和最先进的元启发式算法。最后,敏感性分析为最优车队规模提供了管理见解。

英文摘要

The increasing demand for flexible automation has accelerated the adoption of heterogeneous automated guided vehicles (AGVs). This work investigates a new scheduling problem in a material transportation system consisting of attachable heterogeneous AGVs, including carriers and shuttles, that flexibly attach and detach for cooperative task execution. While such collaboration enhances operational efficiency, the attachment-induced synchronization renders the system highly coupled and susceptible to deadlocks. To address this, we propose a Petri net (PN)-based deadlock-free scheduling framework integrated into an adaptive large neighborhood search (ALNS) algorithm. The PN is introduced to map candidate solutions from static permutations into dynamic collaborative processes, enabling performance evaluation via state evolution and proactive deadlock prevention through structural analysis. Extensive experiments on real-world and synthetic instances demonstrate that the proposed framework significantly improves computational efficiency, with the developed ALNS outperforming the current on-site policy, exact solvers, and state-of-the-art metaheuristics. Finally, sensitivity analysis yields managerial insights for optimal fleet sizing.

2407.10247 2026-06-09 cs.CY cs.AI cs.LG econ.GN q-fin.EC 版本更新

Strategic Integration of Artificial Intelligence in the C-Suite: The Role of the Chief AI Officer

人工智能在C级管理层的战略整合:首席人工智能官的角色

Marc Schmitt

发表机构 * University of Oxford(牛津大学)

AI总结 本文提出角色设计理论,解释企业为何设立首席AI官(CAIO)或采用其他结构,并分析AI的独特属性(分布式判断问责、上游治理、非平稳性)如何影响高管角色设计。

详情
AI中文摘要

人工智能(AI)融入企业战略已成为组织在数字时代保持竞争优势的关键。尽管组织日益将AI视为战略和组织资源,但现有的C级管理层角色仅部分具备在企业层面统一治理、整合和利用AI的能力。各组织的应对方式不同:有的设立专职首席AI官(CAIO),有的将现有职责扩展为混合角色,还有的通过联邦式结构协调AI。本文发展了一种角色设计理论来解释这种差异。我识别出AI区别于以往跨领域企业技术的三个属性——分布式判断问责、上游治理和非平稳性——以及组织应对的三种配置:集中扩展、分布式扩展和角色创建。CAIO框架将这些属性与它们产生的行政设计问题以及专职角色所需的功能和能力联系起来。四个命题具体说明了专职CAIO何时出现、组织采取何种形式、专职角色何时有效以及配置如何随时间演变。本文通过提供高管层面AI战略整合的理论驱动解释,为高管领导力、组织设计和数字治理研究做出贡献。

英文摘要

The integration of Artificial Intelligence (AI) into corporate strategy has become critical for organizations seeking to maintain competitive advantage in the digital age. Although organizations increasingly rely on AI as a strategic and organizational resource, existing C-suite roles remain only partially equipped to govern, integrate, and leverage it coherently at the enterprise level. Organizations vary in their responses. Some create a dedicated Chief AI Officer (CAIO), others extend existing mandates into hybrid roles, and still others coordinate AI through federated structures. This paper develops a role-design theory to explain this variation. I identify three properties that distinguish AI from earlier cross-cutting enterprise technologies - distributed accountability for judgment, upstream governance, and non-stationarity - and three configurations through which organizations respond: concentrated extension, distributed extension, and role creation. The CAIO Framework links these properties to the executive design problems they generate and to the functions and capabilities required of the dedicated role. Four propositions specify when a dedicated CAIO emerges, what form an organization's response takes, when the dedicated role is effective, and how configurations evolve over time. This paper contributes to research on executive leadership, organizational design, and digital governance by offering a theory-driven account of the strategic integration of AI at the executive level.

2507.08064 2026-06-09 cs.MM cs.CV 版本更新

PUMA: Layer-Pruned Language Model for Efficient Unified Multimodal Retrieval with Modality-Adaptive Learning

PUMA: 基于层剪枝的语言模型,用于具有模态自适应学习的高效统一多模态检索

Yibo Lyu, Rui Shao, Gongwei Chen, Yijie Zhu, Weili Guan, Liqiang Nie

发表机构 * Harbin Institute of Technology(哈尔滨工业大学)

AI总结 提出PUMA,通过层剪枝自蒸馏减少MLLM参数,并设计模态自适应对比学习损失(MAC-Loss)提升检索效率,在降低资源消耗的同时保持性能。

详情
AI中文摘要

随着多媒体内容的扩展,现实应用中对统一多模态检索(UMR)的需求日益增加。最近的工作利用多模态大语言模型(MLLM)来解决这一任务。然而,其庞大的参数量导致训练成本高、推理效率低。为此,我们提出PUMA:一种基于层剪枝的语言模型,用于具有模态自适应学习的高效统一多模态检索。我们的方法从结构和学习两个角度改进UMR。(1)在结构上,我们提出层剪枝自蒸馏,通过仅保留浅层来剪枝MLLM,同时从丢弃的深层中蒸馏特征作为教师信号。这减少了参数并保持了表示能力。(2)在学习方面,我们引入模态自适应对比学习损失(MAC-Loss),根据目标模态将批次内负样本分为更难的模态内和更易的模态间两组,分配不同的温度策略以增强学习效率。实验表明,我们的方法在保持强性能的同时显著减少了资源使用。

英文摘要

As multimedia content expands, the demand for unified multimodal retrieval (UMR) in real-world applications increases. Recent work leverages multimodal large language models (MLLMs) to tackle this task. However, their large parameter size results in high training costs and low inference efficiency. To address this, we propose PUMA: a Layer-Pruned Language Model for Efficient Unified Multimodal Retrieval with Modality-Adaptive Learning. Our approach improves UMR from both structural and learning perspectives. (1) Structurally, we propose Layer-Pruned Self-Distillation, which prunes MLLMs by keeping only shallow layers while distilling features from dropped deep layers as teacher signals. This reduces parameters and preserves representation capability. (2) On the learning side, we introduce Modality-Adaptive Contrastive Learning Loss (MAC-Loss), which separates in-batch negatives into harder intra-modality and easier inter-modality groups based on the target modality, assigning different temperature strategies to enhance learning efficiency. Experiments show our method significantly reduces resource usage while maintaining strong performance.

2503.08703 2026-06-09 cs.NE cs.CV 版本更新

SDTrack: A Baseline for Event-based Tracking via Spiking Neural Networks

SDTrack: 基于脉冲神经网络的事件驱动跟踪基线

Yimeng Shan, Zhenbang Ren, Haodi Wu, Wenjie Wei, Rui-Jie Zhu, Shuai Wang, Dehao Zhang, Yichen Xiao, Jieyuan Zhang, Kexin Shi, Jingzhinan Wang, Jason K. Eshraghian, Haicheng Qu, Malu Zhang

发表机构 * University of Electronic Science and Technology of China(电子科学与技术大学) Shenzhen Loop Area Institute(深圳环城院) Liaoning Technical University(辽宁技术大学) University of California, Santa Cruz(加州大学圣克鲁兹分校)

AI总结 提出首个基于Transformer的全脉冲驱动跟踪流水线SDTrack,通过全局轨迹提示方法聚合事件流,实现低能耗高精度跟踪。

Comments 10 pages,8 figures,4 tables

详情
AI中文摘要

事件相机提供了优越的时间分辨率、动态范围、能效和像素带宽。脉冲神经网络(SNNs)通过离散脉冲信号自然补充事件数据,使其成为事件驱动跟踪的理想选择。然而,当前结合人工神经网络(ANNs)和SNNs的方法存在次优架构,损害了能效并限制了跟踪性能。为了解决这些限制,我们提出了首个基于Transformer的脉冲驱动跟踪(SDTrack)流水线。它包含一种称为全局轨迹提示(GTP)的新型事件帧聚合方法和一个基于Transformer的跟踪器。GTP方法有效捕获全局轨迹信息,并将其与事件流聚合到事件帧中,以增强时空表示。基于Transformer的跟踪器包括一个完全脉冲驱动的SNN骨干网络和一个简单的跟踪头。SDTrack流水线端到端运行,无需数据增强或后处理。大量实验表明,我们的SDTrack-Tiny版本仅用19.61M参数和8.16mJ能耗即可实现竞争性精度,而Base版本在三个数据集上达到了最先进的精度。我们的工作为未来的神经形态视觉研究奠定了坚实基础。

英文摘要

Event cameras provide superior temporal resolution, dynamic range, energy efficiency, and pixel bandwidth. Spiking Neural Networks (SNNs) naturally complement event data through discrete spike signals, making them ideal for event-based tracking. However, current approaches combining Artificial Neural Networks (ANNs) and SNNs suffer from suboptimal architectures that compromise energy efficiency and limit tracking performance. To address these limitations, we propose the first Transformer-based \textbf{S}pike-\textbf{D}riven \textbf{T}racking (SDTrack) pipeline. It incorporates a novel event frame aggregation method called Global Trajectory Prompt (GTP) and a Transformer-based tracker. The GTP method effectively captures global trajectory information and aggregates it with event streams into event frames to enhance spatiotemporal representation. The Transformer-based tracker comprises a fully spike-driven SNN backbone and a simple tracking head. The SDTrack pipeline operates end-to-end without data augmentation or post-processing. Extensive experiments demonstrate that our SDTrack-Tiny pipeline achieves competitive accuracy with only 19.61$M$ parameters and 8.16$mJ$ energy consumption, while our Base version achieves state-of-the-art accuracy across three datasets. Our work establishes a solid foundation for future neuromorphic vision research.

2506.20573 2026-06-09 stat.ML cs.LG 版本更新

LARP: Learner-Agnostic Robust Data Prefiltering

LARP: 学习者无关的鲁棒数据预过滤

Kristian Minchev, Dimitar I. Dimitrov, Nikola Konstantinov

发表机构 * INSAIT, Sofia University "St. Kliment Ohridski"(INSAIT,索菲亚大学‘圣克莱门特·奥赫里德斯基’)

AI总结 提出LARP框架,通过预过滤程序保护多种下游学习器性能,理论证明可行性并分析性能损失,实验评估了图像和表格任务中的代价。

Comments Published in Transactions on Machine Learning Research (06/2026). URL: https://openreview.net/forum?id=gI6VOV3jfO

详情
AI中文摘要

公共数据集对现代机器学习和统计推断至关重要,但通常包含低质量或受污染的样本,这可能损害模型性能。因此,需要一种原则性的预过滤程序,数据提供者可以应用该程序同时保护一系列潜在下游统计和学习程序的准确性。在这项工作中,我们形式化并分析了学习者无关的鲁棒数据预过滤(LARP),即设计预过滤程序的问题,该程序对预先指定的学习者集合上的最坏情况损失有保证。我们在两个理论环境中建立了LARP的可行性,通过提供最坏情况损失的上界保证。我们的理论结果表明,与针对单个学习者的特定预过滤相比,通过LARP保护异构学习者集合会以一定的性能损失为代价;我们将这一差距称为LARP的代价。为了评估这一性能差距,我们在图像和表格任务上实证测量了LARP的代价。我们进一步从节省重复数据整理工作的角度探讨了LARP的潜在好处,在一个博弈论模型中,下游学习者可以分摊单一预过滤的成本。

英文摘要

Public datasets, crucial for modern machine learning and statistical inference, often contain low-quality or contaminated samples that can harm model performance. This creates a need for principled prefiltering procedures that a data provider can apply to protect the accuracy of a range of potential downstream statistical and learning procedures simultaneously. In this work, we formalize and analyze Learner-Agnostic Robust data Prefiltering (LARP), the problem of designing prefiltering procedures with guarantees on the worst-case loss over a pre-specified set of learners. We establish the feasibility of LARP in two theoretical settings, by providing upper-bound guarantees on the worst-case loss. Our theoretical results indicate that protecting heterogeneous learner sets via LARP comes at the price of some performance loss compared to individual, learner-specific prefiltering; we call this gap the price of LARP. To assess this gap in performance, we empirically measure the price of LARP across image and tabular tasks. We further explore potential benefits of LARP from the perspective of saving on repeated data curation efforts, in a game-theoretic model where the downstream learners can split the cost of the single prefiltering.

2503.08434 2026-06-09 cs.GR cs.CV 版本更新

Bokeh Diffusion: Defocus Blur Control in Text-to-Image Diffusion Models

Bokeh Diffusion:文本到图像扩散模型中的散焦模糊控制

Armando Fortes, Tianyi Wei, Shangchen Zhou, Xingang Pan

发表机构 * S-Lab, Nanyang Technological University(S实验室,南洋理工大学)

AI总结 提出Bokeh Diffusion框架,通过物理散焦模糊参数条件化扩散模型,结合混合训练流程和接地自注意力机制,实现场景一致的散景模糊控制,支持双向模糊强度调整和真实图像编辑。

Comments SIGGRAPH Asia 2025. Project page: https://atfortes.github.io/projects/bokeh-diffusion/

详情
AI中文摘要

近年来,大规模文本到图像模型的进展通过从文本提示生成视觉上引人入胜的输出,彻底改变了创意领域;然而,传统摄影通过光圈等相机设置精确控制景深以塑造视觉美学,而当前的扩散模型通常依赖提示工程来模拟此类效果。这种方法往往导致粗略的近似,并意外地改变场景内容。在这项工作中,我们提出了Bokeh Diffusion,一个场景一致的散景控制框架,它显式地将扩散模型条件化在一个物理散焦模糊参数上。为了克服在不同相机设置下捕获的配对真实世界图像的稀缺性,我们引入了一个混合训练流程,将野外图像与合成模糊增强对齐,提供多样化的场景和主体,以及监督学习以分离图像内容与镜头模糊。我们框架的核心是接地自注意力机制,该机制在同一场景的不同散景水平的图像对上训练,使得模糊强度可以在保持底层场景的同时双向调整。大量实验表明,我们的方法实现了灵活的、类似镜头的模糊控制,支持通过反演进行真实图像编辑等下游应用,并在Stable Diffusion和FLUX架构上有效泛化。

英文摘要

Recent advances in large-scale text-to-image models have revolutionized creative fields by generating visually captivating outputs from textual prompts; however, while traditional photography offers precise control over camera settings to shape visual aesthetics - such as depth-of-field via aperture - current diffusion models typically rely on prompt engineering to mimic such effects. This approach often results in crude approximations and inadvertently alters the scene content. In this work, we propose Bokeh Diffusion, a scene-consistent bokeh control framework that explicitly conditions a diffusion model on a physical defocus blur parameter. To overcome the scarcity of paired real-world images captured under different camera settings, we introduce a hybrid training pipeline that aligns in-the-wild images with synthetic blur augmentations, providing diverse scenes and subjects as well as supervision to learn the separation of image content from lens blur. Central to our framework is our grounded self-attention mechanism, trained on image pairs with different bokeh levels of the same scene, which enables blur strength to be adjusted in both directions while preserving the underlying scene. Extensive experiments demonstrate that our approach enables flexible, lens-like blur control, supports downstream applications such as real image editing via inversion, and generalizes effectively across both Stable Diffusion and FLUX architectures.

2506.04480 2026-06-09 stat.ML cs.LG stat.ME 版本更新

On the Wasserstein Geodesic Principal Component Analysis of probability measures

关于概率测度的Wasserstein测地主成分分析

Nina Vesseron, Elsa Cazelles, Alice Le Brigant, Thierry Klein

发表机构 * CREST-ENSAE, IP Paris(CREST-ENSAE,IP巴黎) CNRS, IRIT, Université de Toulouse(CNRS,IRIT,图卢兹大学) Université Paris 1 Panthéon Sorbonne(巴黎第一大学巴黎政治学院) ENAC, IMT, Université de Toulouse(ENAC,IMT,图卢兹大学)

AI总结 本文利用Otto-Wasserstein几何,对概率分布集合进行测地主成分分析,通过识别概率测度空间中的测地线来捕捉数据变化模式,并针对高斯分布和绝对连续概率测度提出计算方法。

详情
AI中文摘要

本文关注使用Otto-Wasserstein几何对概率分布集合进行测地主成分分析(GPCA)。目标是识别概率测度空间中能够最好地捕捉底层数据集变化模式的测地线。我们首先处理高斯分布集合的情况,并展示如何将计算提升到可逆线性映射的空间。对于更一般的绝对连续概率测度设置,我们利用一种新颖的方法,通过神经网络参数化Wasserstein空间中的测地线。最后,我们通过各种示例与经典切空间PCA进行比较,并在真实世界数据集上提供说明。

英文摘要

This paper focuses on Geodesic Principal Component Analysis (GPCA) on a collection of probability distributions using the Otto-Wasserstein geometry. The goal is to identify geodesic curves in the space of probability measures that best capture the modes of variation of the underlying dataset. We first address the case of a collection of Gaussian distributions, and show how to lift the computations in the space of invertible linear maps. For the more general setting of absolutely continuous probability measures, we leverage a novel approach to parameterizing geodesics in Wasserstein space with neural networks. Finally, we compare to classical tangent PCA through various examples and provide illustrations on real-world datasets.

2412.16457 2026-06-09 stat.ML cs.DS cs.LG math.PR math.ST stat.TH 版本更新

Robust Random Graph Matching in Dense Graphs via an Approximate Message Passing Type Algorithm

稠密图中的鲁棒随机图匹配:基于近似消息传递类型算法

Zhangsong Li

发表机构 * Peking University(北京大学)

AI总结 针对带潜在顶点对应的相关高斯Wigner矩阵对,提出一种近似消息传递迭代算法,在对抗性扰动下实现多项式时间匹配恢复,扰动规模可达n^{1-o(1)}。

Comments 46 pages; accepted by IEEE Trans. Inf. Theory

详情
AI中文摘要

本文关注一对具有潜在顶点对应的相关高斯Wigner矩阵的匹配恢复问题。我们特别关注该问题的鲁棒版本,其中观测为扰动输入$(A+E,B+F)$,$(A,B)$是一对相关高斯Wigner矩阵,$E,F$是分别支撑在$A,B$的未知$\epsilon n \times \epsilon n$主子矩阵上的对抗性选择矩阵。我们提出一种近似消息传递(AMP)类型迭代算法,只要$(A,B)$之间的相关性$\rho$为非零常数且$\epsilon = o\big( \tfrac{1}{(\log n)^{20}} \big)$,该算法就能在多项式时间内成功。与标准AMP的关键区别在于,迭代中引入了时间依赖的矩阵乘法步骤,该步骤同时扩大特征维度并在迭代过程中抵消相关性。我们结果的主要方法输入来自\cite{DL22+, DL23+}中提出的迭代随机图匹配算法和\cite{IS24+}中提出的谱预处理过程。据我们所知,我们的算法是首个在任意$n^{1-o(1)}$大小的对抗性扰动下具有鲁棒性的高效随机图匹配类型算法。

英文摘要

In this paper, we focus on the matching recovery problem between a pair of correlated Gaussian Wigner matrices with a latent vertex correspondence. We are particularly interested in a robust version of this problem such that our observation is a perturbed input $(A+E,B+F)$ where $(A,B)$ is a pair of correlated Gaussian Wigner matrices and $E,F$ are adversarially chosen matrices supported on an unknown $εn * εn$ principal minor of $A,B$, respectively. We propose an approximate message passing (AMP) type iterative algorithm that succeeds in polynomial time as long as the correlation $ρ$ between $(A,B)$ is a non-vanishing constant and $ε= o\big( \tfrac{1}{(\log n)^{20}} \big)$. A key distinction from standard AMP is the introduction of a time-dependent matrix multiplication step within the iteration, which simultaneously enlarges the feature dimension and cancels the correlation during the iteration. The main methodological inputs for our result are the iterative random graph matching algorithm proposed in \cite{DL22+, DL23+} and the spectral preprocessing procedure proposed in \cite{IS24+}. To the best of our knowledge, our algorithm is the first efficient random graph matching type algorithm that is robust under any adversarial perturbations of $n^{1-o(1)}$ size.

2505.07833 2026-06-09 cs.DC cs.AI cs.MA cs.OS 版本更新

Harmonia: End-to-End RAG Serving Optimization

Harmonia: 端到端RAG服务优化

Saurabh Agarwal, Bodun Hu, Luis Pabon, Myungjin Lee, Jayanth Srinivasa, Aditya Akella

发表机构 * UT Austin(德克萨斯大学奥斯汀分校) Cisco Research(思科研究) Cisco Systems(思科系统)

AI总结 提出Harmonia框架,通过灵活管道接口、异构感知部署和闭环运行时控制器,优化RAG服务,吞吐量提升2.04倍以上,SLO违规减少78.4%。

详情
AI中文摘要

检索增强生成(RAG)通过集成外部知识提高了大型语言模型的可靠性,但高效服务RAG管道具有挑战性,因为请求会遍历跨越LLM推理、数据库和CPU端处理的异构组件。我们提出了Harmonia,一个端到端的RAG服务框架,通过以下方式解决这些瓶颈:(i) 灵活的管道规范接口,用于组合自定义工作流;(ii) 异构感知部署,将组件作为分布式推理系统进行配置和部署;(iii) 闭环运行时控制器,监控负载和执行进度,并通过请求优先级排序和自动缩放减少SLO违规。在四个RAG应用中,Harmonia优于商业替代方案,吞吐量提升超过2.04倍,同时SLO违规减少高达78.4%。

英文摘要

Retrieval-Augmented Generation (RAG) improves the reliability of large language models by integrating external knowledge, but serving RAG pipelines efficiently is challenging because requests traverse heterogeneous components spanning LLM inference, databases, and CPU-side processing. We present Harmonia, an end-to-end RAG serving framework that addresses these bottlenecks through (i) a flexible pipeline specification interface for composing custom workflows, (ii) heterogeneity-aware deployment that provisions and configures components as a distributed inference system, and (iii) a closed-loop runtime controller that monitors load and execution progress and reduces SLO violations through request prioritization and auto-scaling. Across four RAG applications, Harmonia outperforms commercial alternatives, improving throughput by more than 2.04x while reducing SLO violations by up to 78.4 percent.

2405.17823 2026-06-09 stat.ML cs.LG math.OA 版本更新

Spectral Truncation Kernels: Noncommutativity in $C^*$-algebraic Kernel Machines

谱截断核:C*-代数核机器中的非交换性

Yuka Hashimoto, Ayoub Hafid, Masahiro Ikeda, Hachem Kadri

发表机构 * NTT, Inc.(NTT公司) Center for Advanced Intelligence Project, RIKEN(RIKEN高级智能项目) Graduate School of Mathematical Sciences, The University of Tokyo(东京大学数学科学研究生院) Graduate School of Information Science and Technology, The university of Osaka(大阪大学信息科学与技术研究生院) Department of Computer Science, Aix-Marseille University, CNRS, LIS(阿维尼昂-马赛大学计算机科学系,CNRS,LIS)

AI总结 提出基于谱截断和C*-代数的谱截断核,通过允许非交换乘积实现函数域上的交互,填补了可分离核与交换核之间的空白,并降低了计算成本。

详情
AI中文摘要

向量值学习和函数值学习中的一个核心问题是如何设计既能捕捉局部和非局部交互又保持计算可行性的核。现有的算子值核仅提供部分答案:可分离核效率高但无法建模函数域上的交互,而交换核仅能捕捉逐点结构。为了解决这个问题,我们提出了谱截断核,这是一类基于谱截断和C*-代数的用于向量值和函数值学习的正定核。通过在核构造中允许非交换乘积,所提出的核能够诱导数据函数域上的交互,并填补了现有可分离核与交换核之间的空白。此外,通过使用C*-代数框架,与现有的使用算子值核的向量值RKHS框架相比,我们降低了计算成本。

英文摘要

A central question in vector- and function-valued learning is how to design kernels that capture both local and non-local interactions while remaining computationally tractable. Existing operator-valued kernels offer only partial answers: separable kernels are efficient but fail to model interactions across the function domain, while commutative kernels capture only pointwise structure. To address this, we propose spectral truncation kernels, a new class of positive definite kernels for vector- and function-valued learning based on spectral truncation and $C^*$-algebra. By allowing noncommutative products in the kernel construction, the proposed kernels induce interactions across the data function domain and fill the gap between existing separable and commutative kernels. In addition, by using the $C^*$-algebraic framework, we reduce the computational cost compared to the existing vector-valued RKHS framework with operator-valued kernels.

2502.18493 2026-06-09 cs.CE cs.AI 版本更新

Rule-based autocorrection of Piping and Instrumentation Diagrams (P&IDs) on graphs

基于规则的管道与仪表图(P&ID)图形自动校正

Lukas Schulze Balhorn, Niels Seijsener, Kevin Dao, Minji Kim, Dominik P. Goldstein, Ge H. M. Driessen, Artur M. Schweidtmann

发表机构 * Process Intelligence Research Group(过程智能研究组) Department of Chemical Engineering(化学工程系) Delft University of Technology(代尔夫特理工大学) Fluor BV Amsterdam, The Netherlands(荷兰阿姆斯特丹Fluor公司)

AI总结 提出一种基于图表示的规则方法,通过33条化工规则实现P&ID的自动错误检测与校正,案例验证其可靠性。

详情
Journal ref
Systems and Control Transactions, Volume 4, 2025, Pages 1656-1661
AI中文摘要

管道与仪表图(P&ID)是化学过程工程中的核心参考文档。目前,化学工程师通过目视检查手动审查P&ID以发现和纠正错误。然而,工程项目可能涉及数百至数千页P&ID,造成巨大的修订工作量。本研究提出一种基于规则的方法,支持工程师进行P&ID的错误检测与校正。该方法基于P&ID的图表示,通过规则图实现自动错误检测与校正,即自动校正。我们使用pyDEXPI Python包从DEXPI标准的P&ID生成P&ID图。在本研究中,我们基于化学工程知识和启发式方法开发了33条规则,并展示了其中五条选定的规则作为示例。一个示例P&ID的案例研究验证了基于规则的自动校正方法在修订P&ID中的可靠性和有效性。

英文摘要

A piping and instrumentation diagram (P&ID) is a central reference document in chemical process engineering. Currently, chemical engineers manually review P&IDs through visual inspection to find and rectify errors. However, engineering projects can involve hundreds to thousands of P&ID pages, creating a significant revision workload. This study proposes a rule-based method to support engineers with error detection and correction in P&IDs. The method is based on a graph representation of P&IDs, enabling automated error detection and correction, i.e., autocorrection, through rule graphs. We use our pyDEXPI Python package to generate P&ID graphs from DEXPI-standard P&IDs. In this study, we developed 33 rules based on chemical engineering knowledge and heuristics, with five selected rules demonstrated as examples. A case study on an illustrative P&ID validates the reliability and effectiveness of the rule-based autocorrection method in revising P&IDs.

2412.19754 2026-06-09 econ.GN cs.AI q-fin.EC 版本更新

Complement or substitute? How AI increases the demand for human skills

互补还是替代?AI如何增加对人类技能的需求

Elina Mäkelä, Matthew Bone, Mareike Sehrer, Farah Nanji, Fabian Stephany

发表机构 * Oxford Internet Institute, University of Oxford(牛津互联网研究所,牛津大学) Burning Glass Institute(燃烧玻璃研究所) Institute for New Economic Thinking, Oxford Martin School(新经济思想研究所,牛津马丁学院) Bruegel(布鲁日)

AI总结 基于2018-2024年美、英、澳近3000万条招聘数据,发现AI岗位更需分析思维等互补技能,且这些技能带来工资溢价,并溢出至非AI岗位,同时替代技能需求下降。

Comments 69

详情
AI中文摘要

人工智能(AI)正在改变工作的性质,但关于它如何影响对人类技能的需求,实证证据有限。本文研究了AI采纳是否增加了与AI技术技能互补的人类能力(如分析思维、韧性或道德判断)在AI密集型岗位内外的重要性和价值。利用2018年至2024年间来自美国、英国和澳大利亚的近3000万条招聘数据,我们区分了公司、行业和地区层面的内部效应(AI岗位内)和外部效应(非AI岗位)。本文有三个主要发现。首先,我们发现AI密集型岗位显著更可能需要互补的非技术能力,如分析思维、韧性和数字素养。其次,这些互补技能与可观的工资溢价相关,尤其是在管理、销售或金融等与AI合作的岗位中。第三,我们表明AI扩散具有潜在的溢出效应:随着AI在公司、行业和地区内的采纳增加,即使是非AI岗位对互补技能的需求也会增加,而对可替代技能(如总结、翻译或客户服务)的需求则下降。这些趋势在美国、英国和澳大利亚等地区均成立,证实了我们发现的稳健性。总之,这些发现表明AI并非简单地替代任务或需要更多AI开发者技能;它可能正在转变劳动力技能需求,以青睐那些增强与智能系统协作的人类特质。

英文摘要

Artificial Intelligence (AI) is transforming the nature of work, yet there is limited empirical evidence on how it affects demand for human skills. This paper examines whether AI adoption increases the prevalence and value of human capabilities that complement technical AI skills, such as analytical thinking, resilience, or ethical judgment, within and beyond AI-intensive job roles. Using a dataset of nearly 30 million job postings from the US, the UK and Australia, between 2018 and 2024, we distinguish between internal effects (within AI roles) and external effects (in non-AI roles) across companies, industries, and regions. This paper has three main findings. First, we find that AI-intensive roles are significantly more likely to require complementary non-technical capabilities, such as analytical thinking, resilience, and digital literacy. Second, these complementary skills are associated with meaningful wage premiums, particularly in managerial, sales or finance roles working with AI. Third, we show that AI diffusion has potential spillover effects: as AI adoption rises within companies, industries, and regions, demand for complementary skills increases even in non-AI roles while demand for substitutable skills - summarisation, translation or customer service - decreases. These trends hold across geographies, including the United States, United Kingdom, and Australia, confirming the robustness of our findings. Together, these findings indicate that AI is not simply replacing tasks or requiring more AI developer skills; it may be transforming workforce skill requirements to favor human attributes that enhance collaboration with intelligent systems.