arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 21512
2602.03972 2026-06-04 stat.ML cs.AI cs.LG

Fixed Budget is No Harder Than Fixed Confidence in Best-Arm Identification up to Logarithmic Factors

固定预算在最佳臂识别中不比固定置信度难(对数因子范围内)

Kapilan Balagopalan, Yinan Li, Yao Zhao, Tuan Nguyen, Anton Daitche, Houssam Nassif, Kwang-Sung Jun

发表机构 * University of California, Berkeley(加州大学伯克利分校) University of Washington(华盛顿大学) University of Texas at Austin(德克萨斯大学奥斯汀分校)

AI总结 本文提出元算法FC2FB,将固定置信度算法转化为固定预算算法,证明固定预算的样本复杂度在log因子内不高于固定置信度。

详情
Journal ref
International Conference on Machine Learning (ICML'26), Seoul, Korea, 2026
AI中文摘要

最佳臂识别(BAI)问题是交互式机器学习中最基本的问题之一,有两种形式:固定预算设置(FB)和固定置信度设置(FC)。对于具有唯一最佳臂的$K$臂赌博机,两种设置的最优样本复杂度已被确定,且在对数因子内匹配。这引出了一个关于通用的、可能具有结构化的BAI问题的有趣研究问题:FB是否比FC更难,还是相反?在本文中,我们证明FB在对数因子内并不比FC难。我们通过构造性方式做到这一点:我们提出了一种名为FC2FB(固定置信度到固定预算)的新算法,这是一种元算法,它接收一个FC算法$\mathcal{A}$并将其转化为FB算法。我们证明FC2FB的样本复杂度与$\mathcal{A}$的样本复杂度在对数因子内匹配。这意味着最优FC样本复杂度是FB最优样本复杂度的一个上界(在对数因子内)。我们的结果不仅揭示了FB和FC之间的基本关系,而且具有重要含义:FC2FB与现有最先进的FC算法相结合,可以改善许多FB问题的样本复杂度。

英文摘要

The best-arm identification (BAI) problem is one of the most fundamental problems in interactive machine learning, which has two flavors: the fixed-budget setting (FB) and the fixed-confidence setting (FC). For $K$-armed bandits with a unique best arm, the optimal sample complexities for both settings have been settled down, and they match up to logarithmic factors. This prompts an interesting research question about the generic, potentially structured BAI problems: is FB harder than FC or the other way around? In this paper, we show that FB is no harder than FC up to logarithmic factors. We do this constructively: we propose a novel algorithm called FC2FB (fixed confidence to fixed budget), which is a meta algorithm that takes in an FC algorithm $\mathcal{A}$ and turn it into an FB algorithm. We prove that FC2FB enjoys a sample complexity that matches, up to logarithmic factors, that of the sample complexity of $\mathcal{A}$. This means that the optimal FC sample complexity is an upper bound of the optimal FB sample complexity up to logarithmic factors. Our result not only reveals a fundamental relationship between FB and FC, but also has a significant implication: FC2FB combined with existing state-of-the-art FC algorithms leads to improved sample complexity for a number of FB problems.

2602.15202 2026-06-04 quant-ph cs.AI cs.NA eess.SP math.NA stat.CO

Tomography by Design: An Algebraic Approach to Low-Rank Quantum States

按设计断层扫描:低秩量子态的代数方法

Shakir Showkat Sofi, Charlotte Vermeylen, Lieven De Lathauwer

发表机构 * Leuven.AI - KU Leuven institute for AI(Leuven.AI - KU莱顿人工智能研究所)

AI总结 提出一种代数算法,通过测量特定可观测量估计密度矩阵的结构化条目,并利用低秩假设通过数值线性代数完成矩阵,实现高效且确定性的量子态层析。

Comments 5 pages, Accepted to EUSIPCO 2026

详情
AI中文摘要

我们提出了一种用于量子态层析的代数算法,该算法利用对某些可观测量的测量来估计底层密度矩阵的结构化条目。在低秩假设下,其余条目可以仅使用标准数值线性代数运算获得。所提出的代数矩阵补全框架适用于一大类通用的低秩混合量子态,并且与最先进的方法相比,计算效率高,同时提供确定性的恢复保证。

英文摘要

We present an algebraic algorithm for quantum state tomography that leverages measurements of certain observables to estimate structured entries of the underlying density matrix. Under low-rank assumptions, the remaining entries can be obtained solely using standard numerical linear algebra operations. The proposed algebraic matrix completion framework applies to a broad class of generic, low-rank mixed quantum states and, compared with state-of-the-art methods, is computationally efficient while providing deterministic recovery guarantees.

2602.14885 2026-06-04 cond-mat.dis-nn cond-mat.stat-mech cs.LG q-bio.NC

Drift-Diffusion Matching: Embedding dynamics in latent manifolds of asymmetric neural networks

漂移-扩散匹配:非对称神经网络潜在流形中的动力学嵌入

Ramón Nartallo-Kaluarachchi, Renaud Lambiotte, Alain Goriely

发表机构 * Mathematical Institute, University of Oxford(牛津大学数学研究所) Centre for Eudaimonia and Human Flourishing, University of Oxford(牛津大学幸福与人类繁荣中心) Complexity Science Hub, Vienna(维也纳复杂科学中心)

AI总结 提出漂移-扩散匹配框架,通过训练连续时间循环神经网络在低维潜在子空间中嵌入任意非线性随机微分方程,利用非对称连接实现非平衡动力学,并应用于联想记忆和序列记忆建模。

Comments 25 pages, 16 figures

详情
AI中文摘要

循环神经网络(RNN)为理解生物神经回路中的计算提供了理论框架,然而经典结果(如Hopfield联想记忆模型)依赖于对称连接,将网络动力学限制为梯度流。相比之下,生物网络支持由其非对称性促进的丰富时间依赖行为。本文引入一个通用框架,称为漂移-扩散匹配,用于训练连续时间RNN在低维潜在子空间中表示具有给定漂移和扩散系数的任意非线性随机微分方程(SDE)。通过允许非对称连接,我们证明RNN能够忠实地嵌入给定SDE的漂移和扩散,包括非线性非平衡动力学(如混沌吸引子)。作为应用,我们构建了随机系统的RNN实现,这些系统通过输入驱动切换和由非平衡电流驱动的自主跃迁短暂探索各种吸引子,我们将其解释为联想记忆和序列(情景)记忆的模型。为了阐明这些动力学如何在网络中编码,我们基于RNN的非对称连接及其时间不可逆性引入分解。我们的结果将吸引子神经网络理论扩展到平衡态之外,表明非对称神经群体可以在低维流形内实现广泛的动力学计算,统一了来自联想记忆、非平衡统计力学和神经计算的思想。

英文摘要

Recurrent neural networks (RNNs) provide a theoretical framework for understanding computation in biological neural circuits, yet classical results, such as Hopfield's model of associative memory, rely on symmetric connectivity that restricts network dynamics to gradient-like flows. In contrast, biological networks support rich time-dependent behaviour facilitated by their asymmetry. Here we introduce a general framework, which we term drift-diffusion matching, for training continuous-time RNNs to represent arbitrary, nonlinear stochastic differential equations (SDEs), with given drift and diffusion coefficients, within a low-dimensional latent subspace. Allowing asymmetric connectivity, we show that RNNs can faithfully embed the drift and diffusion of a given SDE, including nonlinear and nonequilibrium dynamics such as chaotic attractors. As an application, we construct RNN realisations of stochastic systems that transiently explore various attractors through both input-driven switching and autonomous transitions driven by nonequilibrium currents, which we interpret as models of associative and sequential (episodic) memory. To elucidate how these dynamics are encoded in the network, we introduce decompositions of the RNN based on its asymmetric connectivity and its time-irreversibility. Our results extend attractor neural network theory beyond equilibrium, showing that asymmetric neural populations can implement a broad class of dynamical computations within low-dimensional manifolds, unifying ideas from associative memory, nonequilibrium statistical mechanics, and neural computation.

2602.14117 2026-06-04 cs.NI cs.AI

Toward Autonomous O-RAN: A Multi-Scale Agentic AI Framework for Real-Time Network Control and Management

迈向自主O-RAN:一种用于实时网络控制与管理的多尺度智能体AI框架

Hojjat Navidan, Mohammad Cheraghinia, Jaron Fontaine, Mohamed Seif, Eli De Poorter, H. Vincent Poor, Ingrid Moerman, Adnan Shahid

发表机构 * IDLab, Department of Information Technology at Ghent University - imec(IDLab,格鲁特大学信息科技系 - imec) Department of Electrical and Computer Engineering, Princeton University(电气与计算机工程系,普林斯顿大学)

AI总结 提出一种多尺度智能体AI框架,通过非实时、近实时和实时控制环的协调层次结构,实现O-RAN的自主网络控制与管理,并在非平稳条件下和意图驱动的切片资源控制场景中验证了其有效性。

Comments Submitted to the IEEE Networks Journal

详情
AI中文摘要

开放无线接入网络(O-RAN)通过解耦、软件驱动的组件和开放接口承诺灵活的6G网络接入,但这种可编程性也增加了操作复杂性。多个控制环共存于服务管理层和RAN智能控制器(RIC)中,而独立开发的控制应用可能以意外方式交互。同时,生成式人工智能的最新进展正在推动从孤立AI模型向能够解释目标、协调多个模型和控制功能并随时间调整行为的智能体AI系统转变。本文提出了一种用于O-RAN的多尺度智能体AI框架,将RAN智能组织为跨非实时(Non-RT)、近实时(Near-RT)和实时(RT)控制环的协调层次结构:(i)Non-RT RIC中的大语言模型(LLM)智能体将运营商意图转化为策略并管理模型生命周期;(ii)Near-RT RIC中的小语言模型(SLM)智能体执行低延迟优化,并可激活、调整或禁用现有控制应用;(iii)分布式单元附近的无线物理层基础模型(WPFM)智能体提供接近空中接口的快速推理。我们描述了这些智能体如何通过标准化的O-RAN接口和遥测进行协作。通过基于开源模型、软件和数据集的概念验证实现,我们在两个代表性场景中展示了所提出的智能体方法:非平稳条件下的鲁棒操作和意图驱动的切片资源控制。

英文摘要

Open Radio Access Networks (O-RAN) promise flexible 6G network access through disaggregated, software-driven components and open interfaces, but this programmability also increases operational complexity. Multiple control loops coexist across the service management layer and RAN Intelligent Controller (RIC), while independently developed control applications can interact in unintended ways. In parallel, recent advances in generative Artificial Intelligence (AI) are enabling a shift from isolated AI models toward agentic AI systems that can interpret goals, coordinate multiple models and control functions, and adapt their behavior over time. This article proposes a multi-scale agentic AI framework for O-RAN that organizes RAN intelligence as a coordinated hierarchy across the Non-Real-Time (Non-RT), Near-Real-Time (Near-RT), and Real-Time (RT) control loops: (i) A Large Language Model (LLM) agent in the Non-RT RIC translates operator intent into policies and governs model lifecycles. (ii) Small Language Model (SLM) agents in the Near-RT RIC execute low-latency optimization and can activate, tune, or disable existing control applications; and (iii) Wireless Physical-layer Foundation Model (WPFM) agents near the distributed unit provide fast inference close to the air interface. We describe how these agents cooperate through standardized O-RAN interfaces and telemetry. Using a proof-of-concept implementation built on open-source models, software, and datasets, we demonstrate the proposed agentic approach in two representative scenarios: robust operation under non-stationary conditions and intent-driven slice resource control.

2602.11189 2026-06-04 q-bio.BM cs.AI

MuCO: Generative Peptide Cyclization Empowered by Multi-stage Conformation Optimization

MuCO:基于多阶段构象优化的生成式肽环化

Yitian Wang, Fanmeng Wang, Angxiao Yue, Wentao Guo, Yaning Cui, Hongteng Xu

发表机构 * Department of XXX, University of YYY, Location, Country(XXX部门,YYY大学,地点,国家) School of ZZZ, Institute of WWW, Location, Country(ZZZ学院,WWW研究所,地点,国家) Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China(中关村人工智能学院,中国人民大学,北京,中国) Beijing Key Laboratory of Research on Large Models(北京大模型研究关键实验室) Engineering Research Center of Next-Generation Intelligent Search(下一代智能搜索工程研究中心)

AI总结 提出MuCO方法,通过多阶段构象优化生成环肽构象,在物理稳定性、结构多样性和计算效率上优于现有方法。

详情
AI中文摘要

建模肽环化对于虚拟筛选具有理想物理和药物特性的候选肽至关重要。这一任务具有挑战性,因为环肽通常呈现多样化的环状构象,而由线性肽折叠推导出的确定性预测模型无法很好地捕捉这些构象。在本研究中,我们提出MuCO(多阶段构象优化),一种生成式肽环化方法,对以相应线性肽为条件的环肽构象分布进行建模。原则上,MuCO将肽环化任务解耦为三个阶段:拓扑感知的主链设计、生成式侧链打包和物理感知的全原子优化,从而以从粗到细的方式生成和优化环肽构象。这种多阶段框架实现了用于构象生成的高效并行采样策略,并允许快速探索多样化的低能构象。在大型CPSea数据集上的实验表明,MuCO在物理稳定性、结构多样性、二级结构恢复和计算效率方面显著且一致地优于最先进的方法,使其成为探索和设计环肽的有前景的计算工具。所提出方法的演示可在https://github.com/mianqiu00/MuCO找到。

英文摘要

Modeling peptide cyclization is critical for the virtual screening of candidate peptides with desirable physical and pharmaceutical properties. This task is challenging because a cyclic peptide often exhibits diverse, ring-shaped conformations, which cannot be well captured by deterministic prediction models derived from linear peptide folding. In this study, we propose MuCO (Multi-stage Conformation Optimization), a generative peptide cyclization method that models the distribution of cyclic peptide conformations conditioned on the corresponding linear peptide. In principle, MuCO decouples the peptide cyclization task into three stages: topology-aware backbone design, generative side-chain packing, and physics-aware all-atom optimization, thereby generating and optimizing conformations of cyclic peptides in a coarse-to-fine manner. This multi-stage framework enables an efficient parallel sampling strategy for conformation generation and allows for rapid exploration of diverse, low-energy conformations. Experiments on the large-scale CPSea dataset demonstrate that MuCO significantly and consistently outperforms state-of-the-art methods in physical stability, structural diversity, secondary structure recovery, and computational efficiency, making it a promising computational tool for exploring and designing cyclic peptides. The demo of the proposed method can be found at https://github.com/mianqiu00/MuCO.

2509.16301 2026-06-04 q-bio.QM cs.LG

TF-DWGNet: A Directed Weighted Graph Neural Network with Tensor Fusion for Multi-Omics Cancer Subtype Classification

TF-DWGNet: 基于张量融合的有向加权图神经网络用于多组学癌症亚型分类

Tiantian Yang, Zhiqian Chen

发表机构 * Mathematics and Statistical Science University of Idaho(数学与统计科学大学 Idaho 大学) Computer Science and Engineering Mississippi State University(计算机科学与工程密苏里州立大学)

AI总结 提出TF-DWGNet框架,结合基于树的有向加权图构建与张量融合机制,解决多组学数据异质性和高阶交互问题,在癌症亚型分类中优于现有方法并提供可解释性。

Comments 9 pages, 4 figures, 4 tables

详情
Journal ref
NAR Genomics and Bioinformatics, Volume 8, Issue 2, 2026, lqag054
AI中文摘要

多组学数据的整合与分析为改善癌症亚型分类提供了宝贵的见解。然而,这些数据本质上是异质的、高维的,并表现出复杂的模态内和模态间依赖关系。图神经网络(GNN)为建模这些结构提供了一个原则性框架,但现有方法通常依赖先验知识或预定义的相似性网络,这些网络生成无向或无权重图,无法捕捉任务特定的方向性和交互强度。在模态和特征层面的可解释性也仍然有限。为了解决这些挑战,我们提出了TF-DWGNet,一种新颖的图神经网络框架,它结合了基于树的有向加权图构建与张量融合,用于多类癌症亚型分类。TF-DWGNet引入了两个关键创新:(i)一种监督的基于树的策略,为每种组学模态构建定制的有向加权图,以及(ii)一种张量融合机制,通过低秩分解捕获单模态、双模态和三模态交互,以提高计算效率。在三个真实世界癌症数据集上的实验表明,TF-DWGNet在多个指标和统计测试中始终优于最先进的基线方法。此外,该模型通过模态级贡献分数和排序的特征重要性提供了生物学上有意义的见解。这些结果突显了TF-DWGNet是癌症研究中多组学整合的有效且可解释的解决方案。

英文摘要

Integration and analysis of multi-omics data provide valuable insights for improving cancer subtype classification. However, such data are inherently heterogeneous, high-dimensional, and exhibit complex intra- and inter-modality dependencies. Graph neural networks (GNNs) offer a principled framework for modeling these structures, but existing approaches often rely on prior knowledge or predefined similarity networks that produce undirected or unweighted graphs and fail to capture task-specific directionality and interaction strength. Interpretability at both the modality and feature levels also remains limited. To address these challenges, we propose TF-DWGNet, a novel Graph Neural Network framework that combines tree-based Directed Weighted graph construction with Tensor Fusion for multiclass cancer subtype classification. TF-DWGNet introduces two key innovations: (i) a supervised tree-based strategy that constructs directed, weighted graphs tailored to each omics modality, and (ii) a tensor fusion mechanism that captures unimodal, bimodal, and trimodal interactions using low-rank decomposition for computational efficiency. Experiments on three real-world cancer datasets demonstrate that TF-DWGNet consistently outperforms state-of-the-art baselines across multiple metrics and statistical tests. In addition, the model provides biologically meaningful insights through modality-level contribution scores and ranked feature importance. These results highlight that TF-DWGNet is an effective and interpretable solution for multi-omics integration in cancer research.

2602.09464 2026-06-04 cs.SE cs.AI cs.CL

AlgoVeri: An Aligned Benchmark for Verified Code Generation on Classical Algorithms

AlgoVeri:面向经典算法的验证代码生成对齐基准

Haoyu Zhao, Ziran Yang, Jiawei Li, Deyuan He, Zenan Li, Chi Jin, Venugopal V. Veeravalli, Aarti Gupta, Sanjeev Arora

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 为解决跨范式验证代码生成评估缺乏统一方法的问题,提出AlgoVeri基准,在Dafny、Verus和Lean三种语言上评估77个经典算法的验证代码生成,揭示不同验证系统的能力差距。

Comments Accepted to ICML 2026, 32 pages

详情
AI中文摘要

验证代码生成指从严格规范生成形式化验证的代码。近期AI模型在验证代码生成方面展现出潜力,但缺乏跨范式的统一评估方法。现有基准仅测试单一语言/工具(如Dafny、Verus和Lean),且各自覆盖非常不同的任务,因此性能数据无法直接比较。我们通过AlgoVeri基准填补这一空白,该基准在Dafny、Verus和Lean上评估77个经典算法的验证代码生成。通过强制使用相同的功能契约,AlgoVeri揭示了验证系统中的关键能力差距。前沿模型在Dafny中取得了可观的成功(Gemini-3 Flash为40.3%),其中高层抽象和SMT自动化简化了工作流,但在Verus的系统级内存约束(24.7%)和Lean所需的显式证明构造(7.8%)下性能急剧下降。除了总体指标,我们还发现了测试时计算动态的显著差异:Gemini-3有效利用迭代修复提升性能(例如,在Dafny中通过率提高三倍),而GPT-OSS则早期饱和。最后,我们的错误分析表明,语言设计影响改进轨迹:Dafny允许模型专注于逻辑正确性,而Verus和Lean将模型困在持久的语法和语义障碍中。所有数据和评估代码可在https://github.com/haoyuzhao123/algoveri获取。

英文摘要

Vericoding refers to the generation of formally verified code from rigorous specifications. Recent AI models show promise in vericoding, but a unified methodology for cross-paradigm evaluation is lacking. Existing benchmarks test only individual languages/tools (e.g., Dafny, Verus, and Lean) and each covers very different tasks, so the performance numbers are not directly comparable. We address this gap with AlgoVeri, a benchmark that evaluates vericoding of $77$ classical algorithms in Dafny, Verus, and Lean. By enforcing identical functional contracts, AlgoVeri reveals critical capability gaps in verification systems. While frontier models achieve tractable success in Dafny ($40.3$% for Gemini-3 Flash), where high-level abstractions and SMT automation simplify the workflow, performance collapses under the systems-level memory constraints of Verus ($24.7$%) and the explicit proof construction required by Lean (7.8%). Beyond aggregate metrics, we uncover a sharp divergence in test-time compute dynamics: Gemini-3 effectively utilizes iterative repair to boost performance (e.g., tripling pass rates in Dafny), whereas GPT-OSS saturates early. Finally, our error analysis shows that language design affects the refinement trajectory: while Dafny allows models to focus on logical correctness, Verus and Lean trap models in persistent syntactic and semantic barriers. All data and evaluation code can be found at https://github.com/haoyuzhao123/algoveri.

2602.06911 2026-06-04 cs.CR cs.AI

TamperBench: Systematically Stress-Testing LLM Safety Under Fine-Tuning and Tampering

TamperBench:系统化压力测试微调和篡改下的LLM安全性

Saad Hossain, Tom Tseng, Punya Syon Pandey, Samanvay Vajpayee, Matthew Kowal, Nayeema Nonta, Samuel Simko, Stephen Casper, Zhijing Jin, Kellin Pelrine, Sirisha Rambhatla

发表机构 * Critical ML Lab Waterloo Canada(Waterloo大学Critical ML实验室) FAR.AI Berkeley USA(伯克利美国FAR.AI公司) University of Toronto Toronto Canada(多伦多大学) University of Waterloo Waterloo Canada(Waterloo大学) ETH Zürich Zürich Switzerland(苏黎世联邦理工学院) MIT CSAIL Cambridge USA(麻省理工学院CSAIL实验室) University of Toronto, MPI, EuroSafeAI, Vector Institute Toronto Canada(多伦多大学、马克斯·普朗克研究所、EuroSafeAI、Vector Institute) Critical ML Lab University of Waterloo Waterloo Canada(Waterloo大学Critical ML实验室)

AI总结 提出统一框架TamperBench,通过系统化超参数扫描评估21个开源LLM在9种篡改威胁下的安全性和实用性,发现越狱微调是最严重攻击,当前对齐阶段防御基本失效。

Comments 25 pages, 15 figures

详情
AI中文摘要

随着能力日益增强的开源大语言模型(LLMs)的部署,提高其抵抗意外或故意不安全修改的篡改能力对于最小化风险变得至关重要。然而,目前没有标准方法来评估篡改抵抗性。不同的数据集、指标和篡改配置使得难以比较不同模型和防御之间的安全性、实用性和鲁棒性。为解决这一问题,我们引入了TamperBench,这是第一个系统评估LLM篡改抵抗性的统一框架。TamperBench (i) 整理了最先进的权重空间微调攻击、潜在空间表示攻击和对齐阶段防御的仓库;(ii) 通过每个攻击-模型对的系统化超参数扫描实现现实的对抗性评估;(iii) 提供安全性和实用性评估。我们使用TamperBench评估了21个开源LLM,包括增强防御的变体,针对九种篡改威胁,使用标准化的安全性和能力指标,并对每个模型-攻击对进行超参数扫描。结果提供了包括后训练对篡改抵抗性的影响、越狱微调通常是最严重的攻击以及当前对齐阶段防御基本无法抵御攻击扫描等见解。代码可在 https://github.com/criticalml-uw/TamperBench 获取。

英文摘要

As increasingly capable open-weight large language models (LLMs) are deployed, improving their tamper resistance against unsafe modifications, whether accidental or intentional, becomes critical to minimize risks. However, there is no standard approach to evaluate tamper resistance. Varied datasets, metrics, and tampering configurations make it difficult to compare safety, utility, and robustness across different models and defenses. To address this, we introduce TamperBench, the first unified framework to systematically evaluate the tamper resistance of LLMs. TamperBench (i) curates a repository of state-of-the-art weight-space fine-tuning attacks, latent-space representation attacks, and alignment-stage defenses; (ii) enables realistic adversarial evaluation through systematic hyperparameter sweeps per attack-model pair; and (iii) provides both safety and utility evaluations. We use TamperBench to evaluate 21 open-weight LLMs, including defense-augmented variants, across nine tampering threats using standardized safety and capability metrics with hyperparameter sweeps per model-attack pair. The results provide insights including effects of post-training on tamper resistance, that jailbreak-tuning is typically the most severe attack, and that current alignment-stage defenses largely fail to withstand attack sweeps. Code is available at https://github.com/criticalml-uw/TamperBench.

2601.07144 2026-06-04 stat.ML cs.LG math.ST stat.TH

Optimal Transport under Group Fairness Constraints

群体公平约束下的最优运输

Linus Bleistein, Mathieu Dagréou, Francisco Andrade, Thomas Boudou, Aurélien Bellet

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 针对最优运输中的群体公平问题,提出通过修正Sinkhorn算法实现完美公平,并开发两种松弛策略(惩罚OT和双层优化学习成本)以平衡公平与匹配质量,给出理论保证和实证结果。

Comments Accepted at ICML 2026 (spotlight)

详情
AI中文摘要

确保匹配算法中的公平性是分配稀缺资源和职位的关键挑战。聚焦于最优运输(OT),我们引入了一种新的群体公平概念,要求OT计划中任意两个给定群体的个体匹配概率满足预定义目标。我们首先提出一种修正的Sinkhorn算法来高效计算完全公平的运输计划。由于实际中精确公平会显著降低匹配质量,我们随后开发了两种松弛策略。第一种涉及求解一个惩罚OT问题,我们为其推导了新的有限样本复杂度保证。第二种策略利用双层优化学习一个诱导公平OT解的基础成本,并建立了匹配未见数据时公平性偏差的界。最后,我们展示了实证结果,说明了我们方法的性能以及公平性与运输成本之间的权衡。

英文摘要

Ensuring fairness in matching algorithms is a key challenge in allocating scarce resources and positions. Focusing on Optimal Transport (OT), we introduce a novel notion of group fairness requiring that the probability of matching two individuals from any two given groups in the OT plan satisfies a predefined target. We first propose a modified Sinkhorn algorithm to compute perfectly fair transport plans efficiently. Since exact fairness can significantly degrade matching quality in practice, we then develop two relaxation strategies. The first one involves solving a penalized OT problem, for which we derive novel finite-sample complexity guarantees. Our second strategy leverages bilevel optimization to learn a ground cost that induces a fair OT solution, and we establish a bound on the deviation of fairness when matching unseen data. Finally, we present empirical results illustrating the performance of our approaches and the trade-off between fairness and transport cost.

2601.21868 2026-06-04 stat.ML cs.LG

On Forgetting and Stability of Score-based Generative models

关于基于分数的生成模型的遗忘与稳定性

Stanislas Strasman, Gabriel Cardoso, Sylvain Le Corff, Vincent Lemaire, Antonio Ocello

发表机构 * Sorbonne Université and Université Paris Cité, CNRS, LPSM(索邦大学和巴黎大学,CNRS,LPSM) Center for Statistics and Images, Mines Paris, PSL University(统计与图像中心,巴黎 Mines,PSL 大学) CREST, ENSAE Paris, Institut Polytechnique de Paris(CREST,ENSAE 巴黎,巴黎理工学院)

AI总结 本文利用反向时间动力学马尔可夫链的遗忘和稳定性性质,在弱假设下通过Lyapunov漂移条件和Doeblin型小化条件,定量分析了基于分数的生成模型的采样误差,并证明了采样过程的定量稳定性。

详情
AI中文摘要

理解生成模型的稳定性和长期行为是现代机器学习中的一个基本问题。本文通过利用与反向时间动力学相关的马尔可夫链的稳定性和遗忘性质,为基于分数的生成模型的采样误差提供了定量界限。在弱假设下,我们提供了两个结构性质以确保反向过程的初始化和离散化误差的传播:一个Lyapunov漂移条件和一个Doeblin型小化条件。一个实际结果是采样过程的定量稳定性,因为反向扩散动力学沿采样轨迹诱导了一种收缩机制。我们的结果阐明了随机动力学在基于分数的模型中的作用,并为分析此类方法中的误差传播提供了一个原则性框架。

英文摘要

Understanding the stability and long-time behavior of generative models is a fundamental problem in modern machine learning. This paper provides quantitative bounds on the sampling error of score-based generative models by leveraging stability and forgetting properties of the Markov chain associated with the reverse-time dynamics. Under weak assumptions, we provide the two structural properties to ensure the propagation of initialization and discretization errors of the backward process: a Lyapunov drift condition and a Doeblin-type minorization condition. A practical consequence is quantitative stability of the sampling procedure, as the reverse diffusion dynamics induces a contraction mechanism along the sampling trajectory. Our results clarify the role of stochastic dynamics in score-based models and provide a principled framework for analyzing propagation of errors in such approaches.

2512.04668 2026-06-04 cs.CR cs.AI cs.CL

Topology Matters: Measuring Memory Leakage in Multi-Agent LLMs

拓扑结构至关重要:多智能体大语言模型中的内存泄漏测量

Jinbo Liu, Defu Cao, Yifei Wei, Tianyao Su, Yuan Liang, Yushun Dong, Yan Liu, Yue Zhao, Xiyang Hu

发表机构 * Arizona State University(亚利桑那州立大学) University of Southern California(南加州大学) Florida State University(佛罗里达州立大学)

AI总结 提出MAMA框架,通过控制图拓扑结构评估多智能体LLM系统中的内存泄漏,发现密集连接、短攻击距离和高中心性增加泄漏,并给出稀疏或层次化拓扑的设计建议。

Comments Accepted to Findings of the Association for Computational Linguistics: ACL 2026. Camera-ready version

详情
AI中文摘要

图拓扑结构是多智能体LLM系统中内存泄漏的基本决定因素,但其影响尚未得到充分量化。我们提出了MAMA(多智能体内存攻击),一个用于比较多智能体LLM系统中拓扑条件内存泄漏的受控评估框架。MAMA操作于包含标记的个人身份信息(PII)实体的合成文档,从中生成经过清理的任务指令。我们执行两阶段协议:Engram(将私人信息植入目标智能体的内存)和Resonance(多轮交互,攻击者尝试提取)。在10轮中,我们使用两阶段恢复标准测量泄漏,该标准结合了精确匹配提取和基于LLM对攻击者最终输出的推理。我们评估了六种典型拓扑(完全图、环、链、树、星、星环),涉及n∈{4,5,6}、攻击者-目标放置和基础模型。结果一致:更密集的连通性、更短的攻击者-目标距离和更高的目标中心性增加泄漏;大多数泄漏发生在早期轮次,然后趋于平稳;模型选择改变绝对比率但保留广泛的结构趋势;时空/位置属性比身份凭证或受监管标识符更容易泄漏。我们提炼出系统设计的实用指导:倾向于稀疏或层次化连通性,最大化攻击者-目标分离,并通过拓扑感知访问控制限制枢纽/捷径路径。我们的代码可在https://github.com/llll121/mama-eval获取。

英文摘要

Graph topology is a fundamental determinant of memory leakage in multi-agent LLM systems, yet its effects remain poorly quantified. We introduce MAMA (Multi-Agent Memory Attack), a controlled evaluation framework for comparing topology-conditioned memory leakage in multi-agent LLM systems. MAMA operates on synthetic documents containing labeled Personally Identifiable Information (PII) entities, from which we generate sanitized task instructions. We execute a two-phase protocol: Engram (seeding private information into a target agent's memory) and Resonance (multi-round interaction where an attacker attempts extraction). Over 10 rounds, we measure leakage using a two-stage recovery criterion that combines exact-match extraction with LLM-based inference over the attacker's final output. We evaluate six canonical topologies (complete, circle, chain, tree, star, star-ring) across $n\in\{4,5,6\}$, attacker-target placements, and base models. Results are consistent: denser connectivity, shorter attacker-target distance, and higher target centrality increase leakage; most leakage occurs in early rounds and then plateaus; model choice shifts absolute rates but preserves broad structural trends; spatiotemporal/location attributes leak more readily than identity credentials or regulated identifiers. We distill practical guidance for system design: favor sparse or hierarchical connectivity, maximize attacker-target separation, and restrict hub/shortcut pathways via topology-aware access control. Our code is available at https://github.com/llll121/mama-eval.

2509.05510 2026-06-04 physics.comp-ph cs.LG

Causal Multi-fidelity Surrogate Forward and Inverse Models for ICF Implosions

因果多保真替代前向与逆向模型用于ICF内爆

Tyler E. Maltba, Ben S. Southworth, Jeffrey R. Haack, Marc L. Klasky

发表机构 * Theoretical Division, Los Alamos National Laboratory(洛斯阿拉莫斯国家实验室理论 division) Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory(洛斯阿拉莫斯国家实验室计算机、计算与统计科学 division)

AI总结 针对惯性约束聚变中的逆向问题,构建因果动态多保真降阶替代模型,通过低/高保真训练数据学习控制器,并利用机器学习模型优化辐射温度驱动以复现观测界面动力学。

详情
AI中文摘要

惯性约束聚变(ICF)的持续进展需要解决将实验观测与模拟输入参数相关联的逆向问题,随后进行设计优化。然而,这类高维动态PDE约束优化问题极具挑战性,甚至难以处理。最近研究表明,通过仅考虑某些鲁棒特征可以解决逆向问题。本文考虑ICF靶丸的氘-氚(DT)界面,构建了一个因果、动态、多保真降阶替代模型,将时间依赖的辐射温度驱动映射到界面的半径和速度动力学。该替代模型针对DT界面动力学的ODE嵌入,通过使用低/高保真模拟训练数据(关于辐射能群结构)学习基础解析模型的控制器来构建。在展示了替代界面模型的优异精度后,我们使用机器学习(ML)模型结合替代生成的数据来解决逆向问题,优化辐射温度驱动以复现观测到的界面动力学。对于稀疏时间快照,ML模型进一步表征了采样动力学最具信息量的时间点。总之,我们展示了如何将算子学习、因果架构和物理归纳偏差整合起来,以加速高能量密度系统中的发现、设计和诊断。

英文摘要

Continued progress in inertial confinement fusion (ICF) requires solving inverse problems relating experimental observations to simulation input parameters, followed by design optimization. However, such high-dimensional dynamic PDE-constrained optimization problems are extremely challenging or even intractable. It has been recently shown that inverse problems can be solved by only considering certain robust features. Here we consider the ICF capsule's deuterium-tritium (DT) interface, and construct a causal, dynamic, multifidelity reduced-order surrogate that maps from a time-dependent radiation temperature drive to the interface's radius and velocity dynamics. The surrogate targets an ODE embedding of DT interface dynamics, and is constructed by learning a controller for a base analytical model using low- and high-fidelity simulation training data with respect to radiation energy group structure. After demonstrating excellent accuracy of the surrogate interface model, we use machine learning (ML) models with surrogate-generated data to solve inverse problems optimizing radiation temperature drive to reproduce observed interface dynamics. For sparse snapshots in time, the ML model further characterizes the most informative times at which to sample dynamics. Altogether we demonstrate how operator learning, causal architectures, and physical inductive bias can be integrated to accelerate discovery, design, and diagnostics in high-energy-density systems.

2510.05013 2026-06-04 stat.ML cs.LG

Curiosity-Driven Development of Action and Language in Robots Through Self-Exploration

通过自我探索的机器人好奇心驱动行为与语言发展

Theodore Jerome Tinker, Kenji Doya, Jun Tani

发表机构 * Okinawa Institute of Science and Technology(冲绳科学技术大学院大学)

AI总结 本研究通过好奇心驱动的机器人自我探索,结合Q学习实现主动推理,揭示了组合泛化、快速学习、先配对后组合以及异常处理导致的U型发展模式,为人类高效语言习得提供解释。

Comments 27 pages, 22 pages of supplementary material

详情
AI中文摘要

婴儿通过极少的经验就能泛化习得语言,而大型语言模型需要数十亿的训练标记。人类高效发展的基础是什么?我们通过实验研究了这一问题,其中机器人代理通过好奇心驱动的自我探索学习执行与祈使句(例如,推红色立方体)相关的动作。我们的方法使用Q学习摊销主动推理,实现内在动机的发展性学习。模拟揭示了与发展心理学观察相对应的关键发现。i) 随着组合元素规模的增加,泛化能力显著提高。ii) 好奇心驱动的探索能够加速学习。iii) 句子和动作的机械配对先于组合泛化。iv) 异常处理导致U型发展表现,这种模式类似于儿童语言学习中的表征重述。这些结果表明,好奇心驱动的主动推理解释了内在动机的感觉运动-语言学习如何支持人类和人工代理中的可扩展组合泛化和异常处理。

英文摘要

Infants acquire language with generalization from minimal experience, whereas large language models require billions of training tokens. What underlies efficient development in humans? We investigated this problem through experiments wherein robotic agents learn to perform actions associated with imperative sentences (e.g., push red cube) via curiosity-driven self-exploration. Our approach amortizes active inference using Q-learning, enabling intrinsically motivated developmental learning. The simulations reveal key findings corresponding to observations in developmental psychology. i) Generalization improves drastically as the scale of compositional elements increases. ii) Curiosity-driven exploration enables faster learning. iii) Rote pairing of sentences and actions precedes compositional generalization. iv) Exception-handling induces U-shaped developmental performance, a pattern like representational redescription in child language learning. These results suggest that curiosity-driven active inference accounts for how intrinsically motivated sensorimotor-linguistic learning supports scalable compositional generalization and exception handling in humans and artificial agents.

2512.10236 2026-06-04 cs.DC cs.AR cs.LG

Design Space Exploration of DMA based Finer-Grain Compute Communication Overlap

基于DMA的细粒度计算通信重叠的设计空间探索

Shagnik Pal, Shaizeen Aga, Suchita Pati, Mahzabeen Islam, Lizy K. John

发表机构 * Advanced Micro Devices Inc.(先进微器件公司) The University of Texas at Austin(德克萨斯大学奥斯汀分校)

AI总结 本文提出细粒度计算通信重叠方法FiCCO,通过深入分片粒度以下、利用DMA引擎卸载通信,实现更广泛网络拓扑下的性能优化,最高获得1.6倍加速。

详情
AI中文摘要

现代机器学习工作负载需要跨多个GPU分布训练和推理。然而,这些并行化技术常常遭受暴露的关键路径通信,通过计算通信重叠可能实现1.7倍的加速。先前的重叠方法利用ML模型状态和输入已经分片到GPU数量的事实,并在分片粒度上重叠计算和通信。然而,这种粗粒度重叠受到有限网络拓扑支持和次优数据流的限制。在这项工作中,我们转而支持更细粒度的计算通信重叠,称之为FiCCO。FiCCO比传统分片深入一层,为更广泛的网络拓扑解锁重叠,并实现更细粒度的数据流。我们表明,FiCCO打开了比仅分片级别更广泛的执行调度设计空间。为了遍历调度的设计空间,我们研究并表征了进行重叠时的性能低效问题,并将调度与相关的低效特征叠加。我们的表征揭示了分解和基于争用的减速是主要的性能限制因素,并将减速因子与静态计算/通信算子大小相关联。这有助于我们设计启发式方法(框架和运行时可以利用)来根据底层ML操作的性质选择定制的FiCCO调度。最后,为了进一步最小化操作重叠固有的争用低效,我们将通信卸载到GPU DMA引擎。我们评估了来自实际ML部署的几种场景,并证明我们提出的启发式驱动的定制调度可提供高达1.6倍的加速。此外,我们的启发式方法在81%的未见场景中提供了准确选择最优调度的指导。

英文摘要

Modern ML workloads demand distributing training and inference across multiple GPUs. However, these parallelization techniques often suffer from exposed critical-path communication, leaving a potential 1.7x speedup on the table through compute-communication overlap. Prior overlapping methods harness the fact that ML model state and inputs are already sharded into the number of GPUs, and overlap the compute and communication at shard granularity. However, such coarse-grained overlap suffers from limited network topology support, and suboptimal dataflows. In this work, we instead make a case for finer-grain compute-communication overlap which we term FiCCO. FiCCO operates one level deeper than traditional sharding, and unlocks overlap for a wider set of network topologies and enables finer-grain dataflow. We show that FiCCO opens up a wider design space of execution schedules than possible at shard-level alone. To walk the design space of schedules, we study and characterize the performance inefficiencies on doing overlap and overlay the schedules with the associated inefficiency signatures. Our characterization reveals decomposition and contention based slowdowns to be the major performance limiters, and we correlate the slowdown factors with the static compute/communication operator sizes. This helps us design heuristics (that frameworks and runtimes can harness) to select bespoke FiCCO schedules based on the nature of underlying ML operations. Finally, to further minimize contention inefficiencies inherent with operation overlap, we offload communication to GPU DMA engines. We evaluate several scenarios from realistic ML deployments and demonstrate that our proposed heuristics driven bespoke schedules deliver up to 1.6x speedup. Further, our heuristics provide accurate guidance to pick the optimal schedule in 81% of unseen scenarios.

2512.06553 2026-06-04 stat.AP cs.LG

A Latent Variable Framework for Scaling Laws in Large Language Models

大型语言模型中缩放定律的潜变量框架

Peiyao Cai, Chengyu Cui, Felipe Maia Polo, Seamus Somerstep, Leshem Choshen, Mikhail Yurochkin, Yuekai Sun, Kean Ming Tan, Gongjun Xu

发表机构 * Department of Statistics, University of Michigan(密歇根大学统计系) IBM Research and CSAIL, MIT(IBM研究与麻省理工学院计算机科学与人工智能实验室) Institute of Foundation Models, MBZUAI(MBZUAI基础模型研究所)

AI总结 提出基于潜变量建模的统计框架,通过引入潜变量捕获不同模型家族和基准的异构性,以更准确地建模大型语言模型的缩放定律。

详情
AI中文摘要

我们提出了一个基于潜变量建模的统计框架,用于大型语言模型(LLMs)的缩放定律。我们的工作受到大量具有不同架构和训练策略的新LLM家族迅速涌现的推动,这些模型在越来越多的基准上进行评估。这种异构性使得单一的全局缩放曲线不足以捕捉不同家族和基准之间的性能变化。为了解决这个问题,我们提出了一个潜变量建模框架,其中每个LLM家族与一个潜变量相关联,该潜变量捕获该家族中常见的底层特征。然后,LLM在不同基准上的性能由其潜在技能驱动,这些技能由潜变量和模型自身的可观测特征共同决定。我们开发了该潜变量模型的估计程序,并建立了其统计性质。我们还设计了支持估计和各种下游任务的高效数值算法。在实验上,我们在Open LLM Leaderboard(v1/v2)的12个广泛使用的基准上评估了该方法。

英文摘要

We propose a statistical framework built on latent variable modeling for scaling laws of large language models (LLMs). Our work is motivated by the rapid emergence of numerous new LLM families with distinct architectures and training strategies, evaluated on an increasing number of benchmarks. This heterogeneity makes a single global scaling curve inadequate for capturing how performance varies across families and benchmarks. To address this, we propose a latent variable modeling framework in which each LLM family is associated with a latent variable that captures the common underlying features in that family. An LLM's performance on different benchmarks is then driven by its latent skills, which are jointly determined by the latent variable and the model's own observable features. We develop an estimation procedure for this latent variable model and establish its statistical properties. We also design efficient numerical algorithms that support estimation and various downstream tasks. Empirically, we evaluate the approach on 12 widely used benchmarks from the Open LLM Leaderboard (v1/v2).

2512.03296 2026-06-04 cs.SI cs.CY cs.LG

Associating Healthcare Teamwork with Patient Outcomes for Predictive Analysis

将医疗团队协作与患者结局关联以进行预测分析

Hsiao-Ying Lu, Kwan-Liu Ma

发表机构 * Department of Computer Science University of California, Davis Davis, USA(加州大学戴维斯分校计算机科学系)

AI总结 本研究通过电子健康记录系统建模医疗专业人员协作网络,应用机器学习技术检测患者生存预测信号,并识别与改善结局相关的关键网络特征,经临床专家验证其现实应用潜力。

详情
AI中文摘要

癌症治疗结局不仅受临床和人口统计学因素影响,还受医疗团队协作的影响。然而,先前的工作在很大程度上忽视了人类协作在塑造患者生存中的潜在作用。本文提出了一种应用人工智能方法,通过电子健康记录(EHR)系统捕获的医疗专业人员(HCP)协作,揭示其对癌症患者结局的影响。我们将EHR介导的HCP交互建模为网络,并应用机器学习技术检测这些协作中嵌入的患者生存预测信号。我们的模型经过交叉验证以确保泛化能力,并通过识别与改善结局相关的关键网络特征来解释预测。重要的是,临床专家和文献验证了所识别关键协作特征的相关性,增强了其在现实应用中的潜力。这项工作为利用协作的数字痕迹和人工智能评估及改善基于团队的医疗保健提供了实用工作流程。该方法可能可转移到涉及复杂协作的其他领域,并提供可操作的见解以支持医疗保健服务中的数据知情干预。

英文摘要

Cancer treatment outcomes are influenced not only by clinical and demographic factors but also by the collaboration of healthcare teams. However, prior work has largely overlooked the potential role of human collaboration in shaping patient survival. This paper presents an applied AI approach to uncovering the impact of healthcare professionals' (HCPs) collaboration, captured through electronic health record (EHR) systems, on cancer patient outcomes. We model EHR-mediated HCP interactions as networks and apply machine learning techniques to detect predictive signals of patient survival embedded in these collaborations. Our models are cross validated to ensure generalizability, and we explain the predictions by identifying key network traits associated with improved outcomes. Importantly, clinical experts and literature validate the relevance of the identified crucial collaboration traits, reinforcing their potential for real-world applications. This work contributes to a practical workflow for leveraging digital traces of collaboration and AI to assess and improve team-based healthcare. The approach is potentially transferable to other domains involving complex collaboration and offers actionable insights to support data-informed interventions in healthcare delivery.

2408.04607 2026-06-04 stat.ML cond-mat.dis-nn cs.LG

Risk and cross validation in ridge regression with correlated samples

带相关样本的岭回归中的风险与交叉验证

Alexander Atanasov, Jacob A. Zavatone-Veth, Cengiz Pehlevan

发表机构 * Department of Physics, Harvard University(哈佛大学物理系) Center for Brain Science, Harvard University(哈佛大学脑科学中心) Society of Fellows, Harvard University(哈佛大学 fellows 会) John A. Paulson School of Engineering and Applied Sciences, Harvard University(哈佛大学约翰·A·保罗森工程与应用科学学院) Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University(哈佛大学自然与人工智能研究学院)

AI总结 利用随机矩阵理论和自由概率,研究了数据点具有任意相关性时岭回归的渐近风险,并提出了修正的广义交叉验证估计器CorrGCV,同时扩展到测试点与训练集相关的情况。

Comments 50 pages, 19 figures. v4: ICML 2025 camera-ready. v5: Fix typo in statement of Theorem 5. v6: typos corrected, to appear in 2026 JSTAT Machine Learning focus collection

详情
Journal ref
International Conference on Machine Learning (2025), https://proceedings.mlr.press/v267/atanasov25a.html
AI中文摘要

近年来,我们对高维岭回归的理解取得了实质性进展,但现有理论假设训练样本是独立的。通过利用随机矩阵理论和自由概率的技术,我们为数据点具有任意相关性时岭回归的样本内和样本外风险提供了精确的渐近结果。我们证明,在这种情况下,广义交叉验证估计器(GCV)无法正确预测样本外风险。然而,当噪声残差与数据点具有相同相关性时,可以修改GCV以产生一个在高维极限下集中的高效可计算无偏估计器,我们称之为CorrGCV。我们进一步将渐近分析扩展到测试点与训练集具有非平凡相关性的情况,这是时间序列预测中经常遇到的情况。假设已知时间序列的相关结构,这再次产生了GCV估计器的扩展,并精确刻画了此类测试点对长期风险产生过于乐观预测的程度。我们在各种高维数据上验证了理论的预测。

英文摘要

Recent years have seen substantial advances in our understanding of high-dimensional ridge regression, but existing theories assume that training examples are independent. By leveraging techniques from random matrix theory and free probability, we provide sharp asymptotics for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations. We demonstrate that in this setting, the generalized cross validation estimator (GCV) fails to correctly predict the out-of-sample risk. However, in the case where the noise residuals have the same correlations as the data points, one can modify the GCV to yield an efficiently-computable unbiased estimator that concentrates in the high-dimensional limit, which we dub CorrGCV. We further extend our asymptotic analysis to the case where the test point has nontrivial correlations with the training set, a setting often encountered in time series forecasting. Assuming knowledge of the correlation structure of the time series, this again yields an extension of the GCV estimator, and sharply characterizes the degree to which such test points yield an overly optimistic prediction of long-time risk. We validate the predictions of our theory across a variety of high dimensional data.

2511.03000 2026-06-04 stat.ML cs.IT cs.LG math.IT

Unifying Information-Theoretic and Pair-Counting Clustering Similarity

统一信息论与配对计数的聚类相似性

Alexander J. Gates

发表机构 * School of Data Science, University of Virginia(数据科学学院,弗吉尼亚大学)

AI总结 本文通过加权展开和高阶扩展两个视角,统一了配对计数与信息论两类聚类相似性度量,揭示了它们之间的分析联系。

Comments 23 pages, 2 figures

详情
AI中文摘要

比较聚类结果对于评估无监督模型至关重要,然而现有的许多相似性度量可能产生广泛分歧、有时甚至矛盾的评估。聚类相似性度量通常分为两大族:配对计数和信息论,分别反映它们是通过元素对还是通过完整聚类列联表的聚合信息来量化一致性。先前的工作已发现这些族之间的相似性,并应用了经验归一化或机会校正方案,但它们更深层的分析联系仍仅部分被理解。在此,我们开发了一个分析框架,通过两个互补视角统一这些族。首先,两个族都表示为观察到的与期望的共现的加权展开,配对计数作为二次低阶近似出现,而信息论度量作为高阶频率加权扩展。其次,我们将配对计数推广到k元组一致性,并表明信息论度量可以被视为系统性地累积超出成对水平的高阶共分配结构。我们针对Rand指数和互信息从分析上说明了这些方法,并展示了每个族中的其他指数如何作为自然扩展出现。总之,这些观点阐明了两个体系何时以及为何产生分歧,将它们的敏感性直接与权重和近似阶数联系起来,并为跨应用选择、解释和扩展聚类相似性度量提供了原则性基础。

英文摘要

Comparing clusterings is central to evaluating unsupervised models, yet the many existing similarity measures can produce widely divergent, sometimes contradictory, evaluations. Clustering similarity measures are typically organized into two principal families, pair-counting and information-theoretic, reflecting whether they quantify agreement through element pairs or aggregate information across full cluster contingency tables. Prior work has uncovered parallels between these families and applied empirical normalization or chance-correction schemes, but their deeper analytical connection remains only partially understood. Here, we develop an analytical framework that unifies these families through two complementary perspectives. First, both families are expressed as weighted expansions of observed versus expected co-occurrences, with pair-counting arising as a quadratic, low-order approximation and information-theoretic measures as higher-order, frequency-weighted extensions. Second, we generalize pair-counting to k-tuple agreement and show that information-theoretic measures can be viewed as systematically accumulating higher-order co-assignment structure beyond the pairwise level. We illustrate the approaches analytically for the Rand index and Mutual Information, and show how other indices in each family emerge as natural extensions. Together, these views clarify when and why the two regimes diverge, relating their sensitivities directly to weighting and approximation order, and provide a principled basis for selecting, interpreting, and extending clustering similarity measures across applications.

2502.00470 2026-06-04 math.OC cs.LG stat.ML

On the Relationship Between CoCoA and ADMM for Distributed Empirical Risk Minimization

关于CoCoA与ADMM在分布式经验风险最小化中的关系

Runxiong Wu, Andi Wang

发表机构 * Department of Industrial & Systems Engineering, University of Wisconsin–Madison(工业与系统工程系,威斯康星大学麦迪逊分校)

AI总结 本文从统一原始-对偶视角揭示CoCoA与ADMM两类分布式ERM算法的内在联系,证明岭正则化下CoCoA等价于特定近端ADMM方案,并给出ADMM型方法的统一收敛分析和早停准则。

Comments 21 pages, 4 figures, 1 table

详情
Journal ref
Published in Transactions on Machine Learning Research (06/2026)
AI中文摘要

分布式经验风险最小化(ERM)通常通过两类有影响力但看似独立的方法来研究:源自分布式对偶坐标上升的CoCoA型算法,以及源自共识和近端分裂的ADMM型算法。本文从统一的原始-对偶视角研究这两类算法的联系。我们证明共识ADMM、线性化共识ADMM、两种分布式近端ADMM变体以及岭正则化CoCoA都可以写成一种涉及全局原始变量和块对偶变量的通用更新形式。这种重新表述使几个先前隐藏的联系变得明确:对于岭正则化ERM,CoCoA在对偶更新层面上与特定的近端ADMM方案一致。此外,原始问题上的共识ADMM等价于对偶问题上的近端ADMM,并具有显式参数映射以及鞍点目标符号反转;线性化变体也存在类似的对应关系。这些结果表明,在岭正则化ERM问题下,经过精细调参的ADMM型算法至少与CoCoA性能相当。统一视角还为共识ADMM提供了自然的原始-对偶间隙早停准则,并为ADMM型方法提供了统一的$O(1/T)$遍历收敛分析。在合成回归问题和真实SVM数据集上的实验支持了预测的关系,阐明了调参的作用,并表明适当调参的ADMM变体在岭正则化设置下可以优于CoCoA。

英文摘要

Distributed empirical risk minimization (ERM) is often studied through two influential yet seemingly separate families of methods: CoCoA-type algorithms, derived from distributed dual coordinate ascent, and ADMM-type algorithms, derived from consensus and proximal splitting. In this paper, we investigate the connection of the two types of algorithms from a unified primal-dual perspective. We show that consensus ADMM, linearized consensus ADMM, two distributed proximal ADMM variants, and ridge-regularized CoCoA can all be written in a common update form involving a global primal variable and block dual variables. This reformulation makes several previously hidden connections explicit: For ridge-regularized ERM, CoCoA coincides with a particular proximal ADMM scheme at the level of the dual update. Moreover, consensus ADMM on the primal problem is equivalent to proximal ADMM on the dual problem under an explicit parameter mapping together with a sign reversal of the saddle objective; similar correspondences also hold for the linearized variants.These results indicates that the ADMM-type algorithms, when fine tuned, performs at least as good as CoCoA, under ridge regularized ERM problems. The unified view also yields a natural primal-dual gap stopping criterion for consensus ADMM and a unified $O(1/T)$ ergodic convergence analysis for the ADMM-type methods. Experiments on synthetic regression problems and real SVM datasets support the predicted relationships, clarify the role of tuning parameters, and show that suitably tuned ADMM variants can outperform CoCoA in the ridge-regularized setting.

2508.08237 2026-06-04 cs.MM cs.AI cs.CV cs.SD eess.AS

VGGSounder: Audio-Visual Evaluations for Foundation Models

VGGSounder:基础模型的音视频评估

Daniil Zverev, Thaddäus Wiedemer, Ameya Prabhu, Matthias Bethge, Wieland Brendel, A. Sophia Koepke

发表机构 * Technical University of Munich, MCML(慕尼黑技术大学,MCML) University of Tübingen(图宾根大学) Tübingen AI Center(图宾根人工智能中心) MPI for Intelligent Systems, ELLIS Institute(智能系统Max Planck研究所,ELLIS研究所)

AI总结 针对VGGSound数据集在音视频基础模型评估中的标签不完整、类别重叠和模态错位等问题,提出重新标注的多标签测试集VGGSounder,并引入模态混淆指标分析模型性能退化。

Comments Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) 2025

详情
AI中文摘要

音视频基础模型的出现凸显了可靠评估其多模态理解能力的重要性。VGGSound数据集常被用作评估音视频分类的基准。然而,我们的分析发现了VGGSound的几个局限性,包括标签不完整、部分类别重叠以及模态错位。这些问题导致对听觉和视觉能力的评估出现偏差。为了解决这些局限性,我们引入了VGGSounder,这是一个全面重新标注的多标签测试集,它扩展了VGGSound,并专门设计用于评估音视频基础模型。VGGSounder具有详细的模态标注,能够精确分析特定模态的性能。此外,通过我们新的模态混淆指标,我们分析了添加另一种输入模态时的性能退化,揭示了模型的局限性。

英文摘要

The emergence of audio-visual foundation models underscores the importance of reliably assessing their multi-modal understanding. The VGGSound dataset is commonly used as a benchmark for evaluation audio-visual classification. However, our analysis identifies several limitations of VGGSound, including incomplete labelling, partially overlapping classes, and misaligned modalities. These lead to distorted evaluations of auditory and visual capabilities. To address these limitations, we introduce VGGSounder, a comprehensively re-annotated, multi-label test set that extends VGGSound and is specifically designed to evaluate audio-visual foundation models. VGGSounder features detailed modality annotations, enabling precise analyses of modality-specific performance. Furthermore, we reveal model limitations by analysing performance degradation when adding another input modality with our new modality confusion metric.

2509.23385 2026-06-04 stat.ML cs.LG

Flow Matching Calibration for Simulation-Based Inference under Model Misspecification

模型误设定下基于模拟推断的流匹配校准

Pierre-Louis Ruhlmann, Michael Arbel, Florence Forbes, Pedro L. C. Rodrigues

发表机构 * Institut national de physique de la matière (CNRS UMR 7586)(物质物理国家研究院(CNRS UMR 7586))

AI总结 针对基于模拟推断中模型误设定导致的偏差,提出流匹配校正后验估计方法,通过少量校准样本利用流匹配范式修正后验估计器,提高推断准确性和不确定性量化。

详情
AI中文摘要

基于模拟的推断(SBI)通过从模拟数据中估计复杂非线性模型的参数,正在变革实验科学。然而,一个持续的挑战是模型误设定。在贝叶斯设置中,针对后验分布,误差可能来自模拟器、噪声或先验建模。这些模型组件只是现实世界的近似,严重的不匹配可能导致有偏或过于自信的后验。我们通过引入流匹配校正后验估计(FMCPE)来解决这个问题,该框架利用流匹配范式,使用少量校准样本细化基于模拟训练的后验估计器。我们的方法分两个阶段进行:首先,在大量模拟数据上训练后验近似器;其次,流匹配将其预测向由校准观测支持的真实后验传输。我们依靠后者来指导校正,无需明确知道误设定形式或哪些模型组件受到影响。这种设计使FMCPE能够结合SBI的可扩展性和对分布偏移的鲁棒性。在合成基准和真实世界数据集上,我们表明我们的提议一致地减轻了误设定的影响,与标准SBI基线相比,提供了改进的推断准确性和不确定性量化,同时保持计算效率。

英文摘要

Simulation-based inference (SBI) is transforming experimental sciences by enabling parameter estimation in complex non-linear models from simulated data. A persistent challenge, however, is model misspecification. In a Bayesian setting, targeting posterior distributions, errors may arise from the simulator, the noise or prior modelling. These model components are only approximations of reality, and severe mismatches can yield biased or overconfident posteriors. We address this issue by introducing Flow Matching Corrected Posterior Estimation (FMCPE), a framework that leverages the flow matching paradigm to refine simulation-trained posterior estimators using a small set of calibration samples. Our approach proceeds in two stages: first, a posterior approximator is trained on abundant simulated data; second, flow matching transports its predictions toward the true posterior supported by calibration observations. We rely on the later to guide the correction, without requiring explicit knowledge of the misspecification form or of which model components are affected. This design enables FMCPE to combine the scalability of SBI with robustness to distributional shift. Across synthetic benchmarks and real-world datasets, we show that our proposal consistently mitigates the effects of misspecification, delivering improved inference accuracy and uncertainty quantification compared to standard SBI baselines, while remaining computationally efficient.

2509.21597 2026-06-04 eess.AS cs.CL cs.SD

AUDDT: A Unified Benchmark Toolkit for Audio and Speech Deepfake Detectors

AUDDT:音频与语音深度伪造检测器的统一基准工具包

Yi Zhu, Heitor R. Guimarães, Arthur Pimentel, Tiago Falk

发表机构 * MuSAELab(MuSAELab实验室)

AI总结 本文提出AUDDT开源基准工具包,通过整合31个数据集并自动化评估预训练检测器,系统分析了深度伪造检测在不同操作类型和录音条件下的泛化能力与性能差异。

详情
AI中文摘要

随着人工智能生成内容(如音频深度伪造)的普及,近期大量工作聚焦于开发深度伪造检测技术。然而,现有基准仅使用少量数据集,使得检测器在真实世界条件下的泛化能力不确定。本文系统回顾了31个现有音频深度伪造数据集,并提出了一个名为AUDDT(https://github.com/MuSAELab/AUDDT)的开源基准测试工具包。该工具包旨在自动化评估预训练检测器在广泛语音和非语音音频数据集上的性能,为用户提供其深度伪造检测器在不同操作类型和录音条件下的优缺点直接反馈。我们首先展示了所开发工具包的使用方法、基准的组成以及不同深度伪造子组的细分。接着,我们强调了AUDDT与现有基准工作的不同之处,即通过大规模、多样化的现代欺骗方法评估以及通过全面的元数据注释进行更丰富的属性级分析。使用一个广泛采用的预训练深度伪造检测器,我们展示了域内和域外检测结果,揭示了在不同条件和音频操作类型下显著的性能差异。最后,我们还分析了这些现有数据集的局限性及其与实际部署场景之间的差距。

英文摘要

With the prevalence of artificial intelligence (AI)-generated content, such as audio deepfakes, a large body of recent work has focused on developing deepfake detection techniques. However, existing benchmarks employ a narrow set of datasets, leaving detector generalization to real-world conditions uncertain. In this paper, we systematically review 31 existing audio deepfake datasets and present an open-source benchmarking toolkit called AUDDT (https://github.com/MuSAELab/AUDDT). The goal of this toolkit is to automate the evaluation of pretrained detectors across a wide range of speech and non-speech audio datasets, giving users direct feedback on the advantages and shortcomings of their deepfake detectors under diverse manipulation types and recording conditions. We start by showcasing the usage of the developed toolkit, the composition of our benchmark, and the breakdown of different deepfake subgroups. Next, we highlight how AUDDT differs from existing benchmarking efforts by enabling large-scale, diverse evaluation across modern spoofing methods and richer attribute-level analysis through comprehensive metadata annotation. Using a widely adopted pretrained deepfake detector, we present in- and out-of-domain detection results, revealing notable performance variability across different conditions and audio manipulation types. Lastly, we also analyze the limitations of these existing datasets and their gaps relative to practical deployment scenarios.

2508.14623 2026-06-04 eess.AS cs.AI cs.SD

A Study of the Scale Invariant Signal to Distortion Ratio in Speech Separation with Noisy References

带噪参考下语音分离中尺度不变信失真比的研究

Simon Dahl Jepsen, Mads Græsbøll Christensen, Jesper Rindom Jensen

发表机构 * European Union(欧洲联盟)

AI总结 本文研究了在训练参考包含噪声时,使用尺度不变信失真比作为评估和训练目标的影响,提出通过增强参考和混合数据来避免学习噪声参考,实验表明可减少分离语音中的噪声但可能引入伪影。

Comments Accepted for IEEE ASRU 2025, Workshop on Automatic Speech Recognition and Understanding. Copyright (c) 2025 IEEE. 8 pages, 6 figures, 2 tables

详情
Journal ref
2025 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Honolulu, HI, USA, 2025, pp. 1-8
AI中文摘要

本文研究了在监督语音分离中,当训练参考包含噪声时(如事实上的基准WSJ0-2Mix),使用尺度不变信失真比(SI-SDR)作为评估和训练目标的影响。对带噪参考的SI-SDR推导表明,噪声限制了可实现的SI-SDR,或导致分离输出中出现不希望的噪声。为了解决这个问题,提出了一种增强参考并用WHAM!扩充混合数据的方法,旨在训练避免学习噪声参考的模型。使用非侵入式NISQA.v2指标评估了在这些增强数据集上训练的两个模型。结果显示分离语音中的噪声减少,但表明处理参考可能引入伪影,限制了整体质量提升。在WSJ0-2Mix和Libri2Mix测试集上,各模型的SI-SDR与感知噪声之间存在负相关,这印证了推导的结论。

英文摘要

This paper examines the implications of using the Scale-Invariant Signal-to-Distortion Ratio (SI-SDR) as both evaluation and training objective in supervised speech separation, when the training references contain noise, as is the case with the de facto benchmark WSJ0-2Mix. A derivation of the SI-SDR with noisy references reveals that noise limits the achievable SI-SDR, or leads to undesired noise in the separated outputs. To address this, a method is proposed to enhance references and augment the mixtures with WHAM!, aiming to train models that avoid learning noisy references. Two models trained on these enhanced datasets are evaluated with the non-intrusive NISQA.v2 metric. Results show reduced noise in separated speech but suggest that processing references may introduce artefacts, limiting overall quality gains. Negative correlation is found between SI-SDR and perceived noisiness across models on the WSJ0-2Mix and Libri2Mix test sets, underlining the conclusion from the derivation.

2506.23546 2026-06-04 q-bio.NC cond-mat.dis-nn cs.LG cs.NE

Neural Langevin Machine: a local asymmetric learning rule can be creative

神经朗之万机:一种局部非对称学习规则可以具有创造性

Zhendong Yu, Weizhong Huang, Haiping Huang

发表机构 * PMI Lab, School of Physics, Sun Yat-sen University(物理系,中山大学,PMI实验室) Guangdong Provincial Key Laboratory of Magnetoelectric Physics and Devices, Sun Yat-sen University(磁电物理与器件广东省重点实验室,中山大学)

AI总结 本文提出神经朗之万机,利用递归神经网络的固定点通过非对称、速率调整的局部学习规则进行生成学习,并揭示了非平衡生成过程及记忆到泛化的转变。

Comments 7 pages, 5 figures, with Github link in the paper, supplemental material available upon request

详情
AI中文摘要

递归神经网络的固定点可用于存储和生成信息。这些固定点可以通过玻尔兹曼-吉布斯测度捕获,从而得到神经朗之万动力学,可用于在真实数据集的生成学习中找到它们。我们将这种生成模型称为神经朗之万机,它推导出一种非对称且放电速率调整的学习规则,仅需要局部神经信号,因此在局部预测学习方面具有生物学相关性。揭示了生成过程中一个有趣的非平衡状态,以及随着训练数据量增加从记忆到泛化的转变。这种神经启发机器还可以实现对不同种类生成图像的相空间连续探索,并且能够对受损图像进行去噪。

英文摘要

Fixed points of recurrent neural networks can be leveraged to store and generate information. These fixed points can be captured by the Boltzmann-Gibbs measure, which leads to neural Langevin dynamics that can be used to find them for generative learning of a real dataset. We call this type of generative model a neural Langevin machine, which derives an asymmetric and firing-rate-speed adjusted learning rule requiring only local neural signals, thereby bearing biological relevance in terms of local predictive learning. An interesting out-of-equilibrium regime of the generative process is revealed, together with a memorization-to-generalization transition with increasing training data size. The neuro-inspired machine can also realize a continuous exploration of the phase space for different kinds of generative images and can denoise a corrupted image as well.

2505.21331 2026-06-04 cs.DS cs.GT cs.LG cs.PF math.PR

Scheduling in Queueing Systems with Uncertain and Evolving Holding Costs

具有不确定和演化持有成本的排队系统中的调度

Caner Gocmen, Thodoris Lykouris, Deeksha Sinha, Wentao Weng

发表机构 * Meta Platforms(Meta平台) Massachusetts Institute of Technology(麻省理工学院)

AI总结 针对持有成本不确定且演化的排队系统,提出基于马尔可夫链的模型和机会调整剩余成本(OaRC)算法,证明其渐近最优性并优于经典规则。

详情
AI中文摘要

在社交媒体平台的内容审核中,延迟审核内容的成本与其观看轨迹成正比,而观看轨迹是波动的且先验未知。受这种不确定且演化的持有成本的启发,我们考虑一个排队模型,其中作业状态基于马尔可夫链演化,并具有状态相关的瞬时持有成本。我们证明,在存在这种不确定且演化的持有成本的情况下,两个经典算法原则——瞬时成本($cμ$规则)和期望剩余成本($cμ/θ$规则)——是次优的。通过将每个作业视为一个马尔可夫滑雪租赁问题,我们开发了一种新的基于索引的算法——机会调整剩余成本(OaRC),该算法在不确定性部分解决时调整到未来服务作业的机会。我们证明OaRC的次优性差距为$ ilde{O}(\sqrt{N})$,其中$N$是系统规模。这个界限表明,当系统规模$N$趋于无穷时,OaRC对于过载系统实现了渐近最优性。此外,该界限与状态空间大小无关,这在作业状态包含上下文信息时是一个理想性质。我们基于社交媒体平台内容审核中出现的两种持有成本模式(在线广告和用户生成内容)进行了广泛的模拟研究,验证了我们的结果。基于合成和真实数据集的模拟表明,OaRC始终优于基于两个经典算法原则的现有实践。

英文摘要

In content moderation for social media platforms, the cost of delaying the review of a content is proportional to its view trajectory, which fluctuates and is apriori unknown. Motivated by such uncertain and evolving holding costs, we consider a queueing model where job states evolve based on a Markov chain with state-dependent instantaneous holding costs. We demonstrate that in the presence of such uncertain and evolving holding costs, the two canonical algorithmic principles, instantaneous-cost ($cμ$-rule) and expected-remaining-cost ($cμ/θ$-rule), are suboptimal. By viewing each job as a Markovian ski-rental problem, we develop a new index-based algorithm, Opportunity-adjusted Remaining Cost (OaRC), that adjusts to the opportunity of serving jobs in the future when uncertainty partly resolves. We show that the suboptimality gap of OaRC scales as $\tilde{O}(\sqrt{N})$, where $N$ is the system size. This bound shows that OaRC achieves asymptotic optimality for overloaded systems when the system size $N$ scales to infinity. Moreover, the bound is independent of the state-space size, which is a desirable property when job states contain contextual information. We corroborate our results with an extensive simulation study based on two holding cost patterns (online ads and user-generated content) that arise in content moderation for social media platforms. Our simulations based on synthetic and real datasets demonstrate that OaRC consistently outperforms existing practice, which is based on the two canonical algorithmic principles.

2503.18721 2026-06-04 math.ST cs.CR cs.LG stat.ME stat.ML stat.TH

Differentially Private Joint Independence Test

差分隐私联合独立性检验

Xingwei Liu, Yuexin Chen, Jin-Ting Zhang, Wangli Xu

发表机构 * Center for Applied Statistics and School of Statistics, Renmin University of China(应用统计中心和中国人民大学统计学院) Department of Statistics and Data Science, National University of Singapore(统计与数据科学系,新加坡国立大学)

AI总结 针对隐私约束下的多随机向量联合依赖检测问题,提出基于差分隐私置换的dHSIC检验方法,实现有效水平、点态一致性和极小极大最优功效。

Comments 57 pages, 7 figures

详情
AI中文摘要

多个随机向量之间的联合依赖识别在许多统计应用中扮演重要角色,其中数据可能包含敏感或机密信息。本文在差分隐私背景下考虑$d$变量希尔伯特-施密特独立性准则(dHSIC)。鉴于dHSIC经验估计的极限分布是复杂的高斯混沌,非隐私场景下的检验通常基于置换和自助法。为了在隐私约束下检测联合依赖,我们提出了一种采用差分隐私置换方法的基于dHSIC的检验程序。我们证明该方法具有隐私保证、有效水平和点态一致性,而自助法存在功效不一致的问题。我们进一步研究了所提检验在dHSIC和$L_2$度量下的均匀功效,表明该检验在不同隐私机制下达到极小极大最优功效。作为副产品,我们证明了Pfister等人(2018)提出的非隐私置换dHSIC检验是我们差分隐私置换检验的特例,并且我们的结果也建立了其点态和均匀功效——从而解决了该工作中的开放问题。因果推断中的数值模拟和真实数据分析表明,我们提出的检验在实证中表现良好。

英文摘要

Identification of joint dependence among several random vectors plays an important role in many statistical applications, where the data may contain sensitive or confidential information. In this paper, we consider the $d$-variable Hilbert-Schmidt independence criterion (dHSIC) in the context of differential privacy. Given that the limiting distribution of the empirical estimate of dHSIC is a complicated Gaussian chaos, constructing tests in the non-private regime is typically based on permutation and bootstrap methods. To detect joint dependence under privacy constraints, we propose a dHSIC-based testing procedure employing a differentially private permutation methodology. We show that our method enjoys privacy guarantees, a valid level, and pointwise consistency, whereas the bootstrap counterpart suffers from inconsistent power. We further investigate the uniform power of the proposed test under the dHSIC and $L_2$ metrics, showing that the proposed test attains the minimax optimal power across different privacy regimes. As a byproduct, we show that the non-private permutation dHSIC test proposed in Pfister et al. (2018) is a special case of our differentially private permutation test, and our results also establish its pointwise and uniform power--thus resolving an open problem from that work. Both numerical simulations and real data analysis in causal inference suggest that our proposed test performs well empirically.

2503.21469 2026-06-04 eess.IV cs.CV

Embedding Compression Distortion in Video Coding for Machines

面向机器的视频编码中的嵌入压缩失真

Yuxiao Sun, Yao Zhao, Meiqin Liu, Chao Yao, Weisi Lin

发表机构 * Beijing Jiaotong University, China(北京交通大学) University of Science and Technology Beijing, China(北京科技大学) Nanyang Technological University, Singapore(南洋理工大学)

AI总结 提出压缩失真表示嵌入(CDRE)框架,通过提取机器感知相关的失真表示并嵌入下游模型,提升压缩视频的任务性能。

详情
AI中文摘要

目前,视频传输不仅服务于人类视觉系统(HVS)以供观看,还服务于机器感知以供分析。然而,现有的编解码器主要针对像素域和HVS感知指标进行优化,而非机器视觉任务的需求。为解决此问题,我们提出了一种压缩失真表示嵌入(CDRE)框架,该框架提取与机器感知相关的失真表示,并将其嵌入下游模型,从而解决压缩过程中丢失的信息并提升任务性能。具体而言,为了更好地分析与机器感知相关的失真,我们设计了一个压缩敏感提取器,用于在特征域中识别压缩退化。为了实现高效传输,引入了一个轻量级失真编解码器,将失真信息压缩为紧凑表示。随后,该表示被逐步嵌入下游模型,使其更好地了解压缩退化并提升性能。在各种编解码器和下游任务上的实验表明,我们的框架能够以最小的比特率、执行时间和参数数量开销,有效提升现有编解码器的率-任务性能。我们的代码和补充材料发布在 https://github.com/Ws-Syx/CDRE/。

英文摘要

Currently, video transmission serves not only the Human Visual System (HVS) for viewing but also machine perception for analysis. However, existing codecs are primarily optimized for pixel-domain and HVS-perception metrics rather than the needs of machine vision tasks. To address this issue, we propose a Compression Distortion Representation Embedding (CDRE) framework, which extracts machine-perception-related distortion representation and embeds it into downstream models, addressing the information lost during compression and improving task performance. Specifically, to better analyze the machine-perception-related distortion, we design a compression-sensitive extractor that identifies compression degradation in the feature domain. For efficient transmission, a lightweight distortion codec is introduced to compress the distortion information into a compact representation. Subsequently, the representation is progressively embedded into the downstream model, enabling it to be better informed about compression degradation and enhancing performance. Experiments across various codecs and downstream tasks demonstrate that our framework can effectively boost the rate-task performance of existing codecs with minimal overhead in terms of bitrate, execution time, and number of parameters. Our codes and supplementary materials are released in https://github.com/Ws-Syx/CDRE/.

2503.06525 2026-06-04 cs.CY cs.AI

From Motion Signals to Insights: A Unified Framework for Student Behavior Analysis and Feedback in Physical Education Classes

从运动信号到洞察:体育课堂学生行为分析与反馈的统一框架

Xian Gao, Jiacheng Ruan, Jingsheng Gao, Mingye Xie, Zongyun Zhang, Ting Liu, Yuzhuo Fu

发表机构 * Shanghai Jiao Tong University(上海交通大学)

AI总结 提出一个基于运动信号和大型语言模型的端到端统一框架,用于体育课堂学生行为分析,自动生成教学洞察和改进建议。

Comments Work in progress

详情
AI中文摘要

在教育场景中分析学生行为对于提高教学质量和学生参与度至关重要。现有的基于AI的模型通常依赖课堂视频录像来识别和分析学生行为。虽然这些基于视频的方法可以部分捕捉和分析学生动作,但在体育课堂中,由于活动在户外开放空间进行且活动多样,它们难以准确跟踪每个学生的动作,并且难以泛化到这些场景中涉及的专业技术动作。此外,当前方法通常缺乏整合专业教学知识的能力,限制了它们提供深入的学生行为洞察和优化教学设计反馈的能力。为了解决这些限制,我们提出了一个统一的端到端框架,该框架利用基于运动信号的人类活动识别技术,结合先进的大型语言模型,对体育课堂中的学生行为进行更详细的分析和反馈。我们的框架从教师的教学设计和学生在体育课期间的运动信号开始,最终生成带有教学洞察和改进建议的自动化报告,以优化学习和课堂教学。该解决方案提供了一种基于运动信号的方法,用于分析学生行为并优化针对体育课堂的教学设计。实验结果表明,我们的框架能够准确识别学生行为并产生有意义的教学洞察。

英文摘要

Analyzing student behavior in educational scenarios is crucial for enhancing teaching quality and student engagement. Existing AI-based models often rely on classroom video footage to identify and analyze student behavior. While these video-based methods can partially capture and analyze student actions, they struggle to accurately track each student's actions in physical education classes, which take place in outdoor, open spaces with diverse activities, and are challenging to generalize to the specialized technical movements involved in these settings. Furthermore, current methods typically lack the ability to integrate specialized pedagogical knowledge, limiting their ability to provide in-depth insights into student behavior and offer feedback for optimizing instructional design. To address these limitations, we propose a unified end-to-end framework that leverages human activity recognition technologies based on motion signals, combined with advanced large language models, to conduct more detailed analyses and feedback of student behavior in physical education classes. Our framework begins with the teacher's instructional designs and the motion signals from students during physical education sessions, ultimately generating automated reports with teaching insights and suggestions for improving both learning and class instructions. This solution provides a motion signal-based approach for analyzing student behavior and optimizing instructional design tailored to physical education classes. Experimental results demonstrate that our framework can accurately identify student behaviors and produce meaningful pedagogical insights.

2502.05349 2026-06-04 math.OC cs.LG

Contextual Scenario Generation for Two-Stage Stochastic Programming

两阶段随机规划的情境生成

David Islip, Roy H. Kwon, Sanghyeon Bae, Woo Chang Kim

发表机构 * Department of Mechanical and Industrial Engineering, University of Toronto(机械与工业工程系,多伦多大学) Department of Industrial and Systems Engineering, Korea Advanced Institute of Science and Technology (KAIST)(工业与系统工程系,韩国科学技术院(KAIST))

AI总结 针对两阶段随机规划中情境数量大、部署受限的问题,提出两种情境生成方法(基于分布和基于任务),通过上下文信息学习生成少量替代情境,并保证决策质量。

Comments 79 pages, 12 figures

详情
AI中文摘要

两阶段随机规划(2SPs)广泛用于不确定性下的决策,但其实际部署通常受限于需要大量情境来近似不确定结果的条件分布。我们研究情境生成:给定上下文信息,学习生成一个小的、用户指定的替代情境集,当将其作为2SP的输入时,能产生高质量的2SP决策。现有的情境生成方法要么忽略上下文信息,要么在此设置下计算负担沉重。我们提出上下文情境生成(CSG),它学习从上下文到一组替代情境的映射。我们开发了两种互补的方法:(i)基于分布的方法,通过最小化与条件分布的基于核的距离来学习从上下文到情境的映射;(ii)基于任务的方法,通过区分下游2SP目标的代理来优化决策质量。这两种方法都广泛适用,仅需要重复求解底层子问题和在生成的情境上定义的2SP。我们提供了有限样本泛化保证,并在多个2SP类别上展示了强大的实证性能。

英文摘要

Two-stage stochastic programs (2SPs) are widely used for decision-making under uncertainty, but their practical deployment is often limited by the large number of scenarios needed to approximate the conditional distribution of uncertain outcomes. We study contextual scenario generation: given contextual information, learn to produce a small, user-specified set of surrogate scenarios that, when used as input into the 2SP, lead to high-quality 2SP decisions. Existing scenario generation methods either ignore contextual information or are computationally burdensome in this setting. We propose contextual scenario generation (CSG), which learns a mapping from context to a set of surrogate scenarios. We develop two complementary methodologies: (i) a distributional approach that learns a mapping from context to scenarios by minimizing a kernel-based distance to the conditional distribution, and (ii) a task-based approach that selects the mapping to optimize decision quality via differentiating through a learned surrogate of the downstream 2SP objective. Both approaches are broadly applicable and require only repeated solution of the underlying subproblems and 2SPs defined on the generated scenarios. We provide finite-sample generalization guarantees and demonstrate strong empirical performance across multiple 2SP classes.

2412.03008 2026-06-04 cs.SI cs.DS cs.LG

Local Clustering on Complex Graphs and Complex Hypergraphs

复杂图与复杂超图上的局部聚类

Zihao Li, Dongqi Fu, Hengyu Liu, Jingrui He

发表机构 * University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校) Meta

AI总结 本文通过扩展非近似的Andersen-Chung-Lang (ACL)聚类算法,提出了GeneralACL和HyperACL两种算法,分别适用于带权、有向、自环图以及边依赖顶点权重的超图,并证明了在温和条件下它们能识别出电导率二次最优的聚类。

Comments KDD 2026, Preprint version. 26 pages

详情
AI中文摘要

局部/种子聚类旨在找到靠近给定起始实例的紧凑聚类。虽然现有的大多数图聚类研究假设离散图设置(即无权重、无向、无自环图),但现实世界的图可能更加复杂。在本文中,我们将经典的非近似Andersen-Chung-Lang (ACL)聚类算法扩展到离散图之外,并将其二次最优性推广到更广泛的复杂图,包括带权、有向、自环图以及具有边依赖顶点权重的超图。具体来说,通过利用PageRank,我们提出了两种算法:用于图的GeneralACL和用于超图的HyperACL。我们证明,在两种温和条件下,这两种算法都能识别出电导率方面二次最优的聚类。此外,我们提供了实验来验证我们的理论发现。我们的代码可在https://github.com/iDEA-iSAIL-Lab-UIUC/HyperACL获取。

英文摘要

Local/seeded clustering aims to find a compact cluster near the given starting instances. While most existing studies on graph clustering assume a discrete graph setting (i.e., unweighted, undirected graphs without self-loops), real-world graphs can be more complex. In this paper, we extend the classic non-approximating Andersen-Chung-Lang (ACL) clustering algorithm beyond discrete graphs and generalize its quadratic optimality to a wider range of complex graphs, including weighted, directed, and self-looped graphs and hypergraphs with edge-dependent vertex weights. Specifically, by leveraging PageRank, we propose two algorithms: GeneralACL for graphs and HyperACL for hypergraphs. We prove that, under two mild conditions, both algorithms can identify a quadratically optimal cluster in terms of conductance. Additionally, we provide experiments to validate our theoretical findings. Our code is available at https://github.com/iDEA-iSAIL-Lab-UIUC/HyperACL.