arXivDaily arXiv每日学术速递 周一至周五更新
重置
cs.GT博弈论19
2606.12260 2026-06-11 econ.TH cs.AI cs.GT cs.LG stat.ML 新提交

Market Design for AI: Beyond the Copyright Binary

人工智能的市场设计:超越版权二元论

Yan Dai, Maryam Farboodi, Negin Golrezaei, Sepehr Shahshahani

AI总结 本文通过静态和动态博弈模型,分析AI训练数据市场中“自由使用”与“强知识产权”两种模式的失败,提出通过数据中介内部化外部性并补贴创新贡献的市场设计。

详情
AI中文摘要

我们如何设计一个用于训练AI模型的人类生成内容市场,既能促进技术进步,又能保留个人创作高质量内容的激励?现有方法采取两极立场:基于合理使用的“自由使用”模式和“强知识产权”模式。我们证明两者均失败:自由使用不补偿创作者,而通过建模为静态Stackelberg博弈,强知识产权也削弱了创作激励。我们发现这对更具创新性的创作者尤其如此,我们将此现象称为“原创性惩罚”。将这一见解扩展到动态模型,我们发现另一种市场失灵会损害AI模型性能,即使对于初始良好的模型也是如此:此类模型导致人类更依赖AI辅助创作,导致同质化内容反馈到训练中,从而降低模型性能——即“精确性诅咒”。我们进一步提出一种市场设计,通过数据中介内部化跨创作者外部性并补贴创新贡献,从而恢复效率。

英文摘要

How can we design a market of human-generated content for use in training AI models that both enables technological progress and preserves individual incentives for high-quality content creation? Existing approaches take polar positions: a "free-for-all" model based on fair use and a "strong intellectual property rights" model. We show that both fail: Free-for-all does not compensate creators, and -- by modeling as a static Stackelberg game -- strong intellectual property rights also underpower creative incentives. We find this especially true for more innovative creators, a phenomenon we term the "originality penalty." Extending this insight to a dynamic model, we find another market failure undermining AI model performance, even for an initially good model: Such a model induces greater reliance by humans on AI-assisted creation, resulting in homogenized content feeding back into training, which degrades the model performance -- a "curse of precision." We further propose a market design with a data intermediary internalizing cross-creator externalities and subsidizing innovative contributions, thereby restoring efficiency.

2606.12187 2026-06-11 cs.GT 新提交

Strategic Facility Location with $p$-Norm Social Costs

具有 $p$-范数社会成本的战略设施选址问题

Jabari Hastings

AI总结 研究在 $\ell_q(\mathbb R^d)$ 空间中,社会成本由任意 $p$-范数定义的战略设施选址问题,分析策略证明的坐标中位数机制的近似比,并给出紧界和上界。

详情
AI中文摘要

我们考虑 $\ell_q(\mathbb R^d)$ 空间中的战略设施选址问题,其中社会成本由个体成本的任意 $p$-范数定义。虽然在 $d = 1$ 情况下,确定性策略证明机制的最优近似比已经很好建立,但在任意 $p$-范数下的多维空间中的保证尚不明确。在这项工作中,我们分析了经过充分研究的、策略证明的坐标中位数(CM)机制,并为这些广义社会成本提供了近似保证。对于 $d = 2$,我们建立了所有 $p, q \geq 1$ 的紧近似比。特别地,我们证明 CM 机制是一个 $2^{1 - 1/ \max(p, q)}$-近似,解决了 Goel 和 Hann-Caruthers(Social Choice and Welfare, 2023)的一个猜想。此外,对于 $d \geq 3$,我们给出了 CM 机制在任意 $p$-范数社会成本下的近似比上界,推广了 Gravin 和 Jia(STOC, 2025)关于功利主义社会成本的最新结果。值得注意的是,我们证明这个近似比永远不会超过 3,无论维度如何。

英文摘要

We consider the strategic facility location problem in $\ell_q(\mathbb R^d)$ spaces where the social cost is defined by an arbitrary $p$-norm of the individual costs. While the optimal approximation ratios for deterministic strategyproof mechanisms are well established in the $d = 1$ setting, the guarantees for multi-dimensional spaces under an arbitrary $p$-norm are less understood. In this work, we analyze the well-studied, strategyproof coordinate-wise median (CM) mechanism and provide approximation guarantees for these generalized social costs. For $d = 2$, we establish tight approximation ratios for all $p, q \geq 1$. In particular, we show that the CM mechanism is a $2^{1 - 1/ \max(p, q)}$-approximation, resolving a conjecture of Goel and Hann-Caruthers (Social Choice and Welfare, 2023). Furthermore, for $d\geq 3$, we give upper bounds on the approximation ratio of the CM mechanism for arbitrary $p$-norm social costs, generalizing the recent result of Gravin and Jia (STOC, 2025) for the utilitarian social cost. Remarkably, we show that this approximation ratio never exceeds 3, regardless of the dimension.

2606.12167 2026-06-11 cs.GT 新提交

Shared Infrastructure Investment and Pricing: Stackelberg Equilibria in Risk-Aware Take-or-Pay Contracts

共享基础设施投资与定价:风险意识下的照付不议合同中的斯塔克尔伯格均衡

Amal Sakr, Andrea Araldo, Tamer Başar, Tijani Chahed

AI总结 研究基础设施提供商与多个风险厌恶型企业在不确定收益下的共享基础设施投资与定价问题,通过斯塔克尔伯格博弈和条件风险价值模型,证明均衡存在性并给出多项式时间近似算法。

详情
AI中文摘要

我们研究由基础设施提供商(InP)部署并由多个通过资源使用产生收入的企业使用的共享基础设施。我们关注一个具有挑战性的场景,其中:(i) 基础设施部署需要大量前期投资,InP必须通过依赖于企业不确定未来收入的支付来回收投资;(ii) 企业的资源使用受到外生因素、基础设施定价、运营成本和资源拥堵的共同影响;(iii) 企业表现出异质性的风险厌恶。这种设置在新兴技术中很典型,例如移动边缘计算(MEC)。我们将此场景形式化为一个新颖的斯塔克尔伯格博弈,包含风险意识的照付不议合同以及企业侧的运营和拥堵成本,其中InP作为领导者,联合优化容量规划和接入定价,而企业作为追随者共享基础设施,并在不确定收入下提前承诺未来资源使用。追随者的异质性风险厌恶通过条件风险价值(CVaR)建模。我们证明了斯塔克尔伯格均衡(SE)的存在性,其中追随者的决策构成广义纳什均衡,并开发了一个多项式时间算法来计算具有有界最优性间隙的近似SE。我们还推导了追随者盈利概率(PoP)的下界。针对MEC案例的蒙特卡洛模拟表明,追随者风险厌恶的增加会降低基础设施容量、定价和领导者利润,同时提高追随者的PoP。

英文摘要

We study a shared infrastructure deployed by an Infrastructure Provider (InP) and used by multiple firms that generate revenues through resource usage. We focus on a challenging setting where: (i) infrastructure deployment requires substantial upfront investment, which the InP must recover via payments by firms that depend on their uncertain future revenues; (ii) firms' resource usage is jointly influenced by exogenous factors, infrastructure pricing, operational costs, and resource congestion; and (iii) firms exhibit heterogeneous risk aversion. This setting is typical in emerging technologies, e.g., Mobile Edge Computing (MEC). We formalize this setting as a novel Stackelberg game with risk-aware take-or-pay contracting and firm-side operational and congestion costs, in which the InP acts as the leader and jointly optimizes capacity dimensioning and access pricing, while firms act as followers that share the infrastructure and commit upfront to future resource usage under uncertain revenues. Followers' heterogeneous risk aversion is modeled through Conditional Value-at-Risk (CVaR). We prove the existence of a Stackelberg equilibrium (SE), in which the followers' decisions constitute a generalized Nash equilibrium, and develop a polynomial-time algorithm that computes an approximate SE with a bounded optimality gap. We also derive a lower bound on the followers' Probability of Profit (PoP). Monte Carlo simulations for a MEC case study show that higher followers' risk aversion reduces infrastructure capacity, pricing, and leader profit, while increasing followers' PoP.

2606.12149 2026-06-11 cs.GT 新提交

Do Not Discretize, Optimize: Almost Greedy Fictitious Play

不要离散化,优化:几乎贪婪的虚构博弈

Evangelos Markakis, Christodoulos Santorinaios

AI总结 提出Almost Greedy Fictitious Play变体,通过约束搜索空间贪婪优化步长,实现零和博弈中对偶间隙的实例相关O(1/T)收敛率。

详情
Comments
18 pages, 7 figures
AI中文摘要

我们的工作围绕虚构博弈展开,这是最早被证明在零和博弈中收敛到纳什均衡的迭代方法之一。近年来,由于在各种机器学习问题中的应用,人们对它的收敛性质以及提出初始算法的新变体重新产生了兴趣。我们的论文沿着这一方向,引入了一个新变体,我们称之为几乎贪婪的虚构博弈。所提出的算法在每个迭代中贪婪地尝试找到最优步长,但其搜索空间受到约束,几乎涵盖了累积混合策略与当前最佳响应之间的整条线。我们的主要结果是,该方法在对偶间隙方面实现了实例相关的$\mathcal{O}(1/T)$收敛率。这与连续虚构博弈的速率相匹配,并提供了一种离散化的替代方案。我们通过实验证明了该方法的有效性,补充了我们的理论发现。

英文摘要

Our work revolves around Fictitious Play, one of the first iterative methods that is known to converge to a Nash equilibrium in zero-sum games. In recent years, there has been a revived interest, due to applications in various machine learning problems, which has motivated a line of work on its convergence properties and on proposing new variants of the initial algorithm. Our paper is along this direction and introduces one new variant, which we refer to as Almost Greedy Fictitious Play. The proposed algorithm greedily attempts to find the optimal stepsize at each iteration but its search space is constrained and includes almost all the line between the cumulative mixed strategy and the current best response. Our main result is that the method achieves an instance dependent convergence rate of $\mathcal{O}(1/T)$ with respect to the duality gap. This matches the rate of Continuous Fictitious Play, and offers an alternative to discretization. We complement our theoretical findings with experiments that demonstrate the effectiveness of the method.

2606.12039 2026-06-11 cs.GT 新提交

Axiomatic Tools for Separating Electoral Control Types, with Applications to Concrete Systems

用于区分选举控制类型的公理工具及其在具体系统中的应用

Michael C. Chavrimootoo, Ian Clingerman, Ethan Ferland, Erin Gibson, Lane A. Hemaspaandra, Quan Luu, David E. Narvaez, Yanfei Wang

AI总结 本文提出公理方法自动区分选举控制类型,在七个投票系统中发现64个新归并和1901个新区分,并给出普适性分离结果。

详情
AI中文摘要

选举控制研究攻击者是否可以通过对选举进行结构性更改(如添加/删除/划分选民或候选人)以某种期望方式影响获胜者。通常认为有44种此类攻击类型是标准的,最近有工作表明,有时这些攻击类型——尽管看似不同——实际上“归并”,即对于每个输入,攻击者要么在两种控制类型下都能实现其目标,要么都不能。然而,这些论文虽然经常利用确保归并的公理结果,但所有区分都是通过人工或计算机生成的反例发现的。这留下了一个问题:即使区分方向是否也可以由公理结果驱动,从而允许大量区分几乎自动获得。我们的论文提供了许多这样的结果,并将其应用于七个重要的投票系统,发现了64个新的归并和1901个新的区分。我们不仅给出了公理充分条件和一项完整刻画结果,还识别出一些普适性分离的控制-问题对——即它们在每个投票规则下都分离。

英文摘要

Electoral control is the study of whether an attacker, by structural changes on an election such as adding/deleting/partitioning voters or candidates, can affect the winner in some desired way. Forty-four such attack types are often considered standard, and recently there has been work showing that sometimes the attack types -- though seemingly distinct -- in fact "collapse," that is, for every input, either the attacker can achieve their goal under both of the control types or under neither of the control types. The papers doing this, however, while often exploiting axiomatic results that ensured collapses, found all the separations by human or computer-generated counterexamples. This left open the issue of whether even the separation direction can be driven by axiomatic results that allow large groups of separations to be almost automatically obtained. Our paper provides many such results, and we apply them to seven important voting systems, finding sixty-four new collapses and 1901 new separations. We not only give axiomatic sufficient conditions and one complete characterization result, but also identify some control-problem pairs that universally separate -- in other words, they separate under every voting rule.

2606.12005 2026-06-11 cs.GT cs.IT 新提交

Game-Theoretic Latent Space Alignment for Multi-user Semantic MIMO Communications

博弈论潜在空间对齐用于多用户语义MIMO通信

Giuseppe Di Poce, Mattia Merluzzi, Emilio Calvanese Strinati, Paolo Di Lorenzo

AI总结 针对多用户语义MIMO干扰网络中的语义失配问题,提出非合作博弈框架,通过闭式解联合优化线性语义MIMO收发机,并设计迭代语义注水算法,实现潜在空间对齐与干扰管理。

详情
AI中文摘要

语义通信通过将原始数据映射为压缩的任务导向潜在表示,实现AI原生无线系统。然而,独立训练的智能体通常依赖异构潜在空间和背景知识,导致语义失配,降低相互理解和下游任务执行性能,尤其在干扰受限的多用户无线网络中。本文研究具有认知无线电约束的多用户语义MIMO干扰网络中的分布式潜在空间对齐问题。我们考虑主用户和语义感知次用户共享相同无线资源,其中次用户必须同时缓解干扰并对齐异构语义表示。为解决此问题,我们将语义对齐建模为非合作博弈,并推导出在功率和干扰约束下联合优化线性语义MIMO收发机的闭式解。利用问题结构,我们将原始矩阵值优化转化为低维功率分配博弈,从而提出迭代语义注水算法。我们建立了存在性、唯一性和全局收敛到纳什均衡的充分条件,明确关联了语义对齐特性和物理信道交互。数值结果评估了所提框架的性能,揭示了语义压缩、任务性能与分层频谱接入之间的关键权衡。

英文摘要

Semantic communications enable AI-native wireless systems by mapping raw data into compressed task-oriented latent representations. However, independently trained agents often rely on heterogeneous latent spaces and background knowledge, leading to semantic mismatch that degrades mutual understanding and downstream task execution, especially in interferencelimited multi-user wireless networks. This paper investigates distributed latent-space alignment in multi-user semantic MIMO interference networks with cognitive radio constraints. We consider primary users and semantic-aware secondary users sharing the same wireless resources, where secondary agents must simultaneously mitigate interference and align heterogeneous semantic representations. To address this problem, we formulate semantic alignment as a non-cooperative game and derive a closed-form solution for the joint optimization of linear semantic MIMO transceivers under power and interference constraints. Exploiting the structure of the problem, we recast the original matrix valued optimization into a lower-dimensional power-allocation game, leading to an iterative semantic water-filling algorithm. We establish sufficient conditions for existence, uniqueness, and global convergence to a Nash equilibrium, explicitly relating semantic alignment properties and physical-channel interactions. Numerical results assess the performance of the proposed framework, revealing key trade-offs among semantic compression, task performance, and hierarchical spectrum access.

2606.11397 2026-06-11 cs.GT econ.TH 新提交

Invariant Price of Anarchy and Multiplicative Smoothness

无政府价格不变性与乘法平滑性

Ilia Shilov, Heinrich H. Nax, Saverio Bolognani

AI总结 针对基数不可比框架,提出乘法平滑性条件,推导无政府价格的不变性界,并扩展到粗相关均衡。

详情
AI中文摘要

无政府价格(PoA)是衡量去中心化效率损失成本的常用指标。几乎所有PoA分析都在假设基数完全可比性(CFC)和平滑性的框架内进行,此时任何导出的界都能方便地从纯纳什均衡扩展到粗相关均衡和无遗憾学习结果。然而,人际效用可比性是一个通常需要证明的额外假设。没有它,基数效用(例如在经典冯·诺伊曼-摩根斯坦框架下定义的)仅对特定于主体的仿射变换唯一,这使得功利主义PoA和经典平滑条件依赖于表示。在本文中,我们在更一般的基数不可比(CNC)框架下操作,其中加权纳什福利是规范的可接受聚合器。我们引入了乘法平滑性,一种与纳什福利的乘法结构相匹配的乘积形式条件,并获得了CNC不变且可扩展到粗相关均衡的PoA界。我们在单选择福利博弈上展示了我们框架的适用性,通过依赖于乘法保留包络和几何闭包的简单证明推导出界。这个界在去中心化真实成本方面的解释关键取决于效用的人际可比性。

英文摘要

The Price of Anarchy (PoA) is a popular measure of the costs of decentralization in terms of efficiency losses. Almost all PoA analyses operate within a framework assuming both Cardinal Full-Comparability (CFC) and smoothness, in which case any derived bounds conveniently extend beyond pure Nash to coarse correlated equilibria and no-regret learning outcomes. However, interpersonal utility comparability is an additional assumption that generally has to be justified. Without it, cardinal utilities (e.g. defined under classical von Neumann--Morgenstern framework) are unique only up to agent-specific affine transformations, rendering both the utilitarian PoA and the classical smoothness conditions representation-dependent. In this paper, we operate under a more general Cardinal Non-Comparability (CNC) framework, under which the weighted Nash welfare is a canonical admissible aggregator. We introduce multiplicative smoothness, a product-form condition matched to the multiplicative structure of Nash welfare, and obtain PoA bounds that are CNC-invariant and extend to coarse correlated equilibria. We demonstrate applicability of our framework on single-choice welfare games, deriving the bounds through simple proof relying on multiplicative retention envelope and geometric closure. The interpretation of this bound in terms of the true cost of decentralization depends crucially on interpersonal comparability of utilities.

2606.11288 2026-06-11 cs.GT cs.IT 新提交

An Entropy-based Framework for Hybrid Coalitions in Game Theory. Part I: Human Arbitration

基于熵的博弈论混合联盟框架。第一部分:人类仲裁

Salome A. Sepulveda-Fontaine, Jose M. Amigo

AI总结 提出NeoGame Theory框架,通过Jensen-Shannon散度定义人类与AI策略的委托规则,实现混合联盟中执行权的交替,并建立频率收敛均衡。

详情
Comments
29 pages, 2 figures (the second with four panels)
AI中文摘要

经典博弈论支撑了人工智能和多智能体研究的大部分基础,但混合人机系统需要一个能够在数字环境中交替执行权限的框架。我们引入了NeoGame Theory,这是经典博弈论的一个扩展,适用于在虚拟自然(经典物理自然的算法类似物)下运行的混合人机联盟。该框架将词典序联盟效用与基于人类和AI策略之间Jensen-Shannon散度的委托规则相结合。两个阈值定义了协议区域、情境区域和分歧区域。在情境区域中,执行遵循特定于场景的规则。除理论外,本文还开发了第一个机制——人类仲裁,其中AI通过观察和频率匹配进行学习,而人类保留最终执行权。我们建立了该框架的公理基础,并刻画了一个频率收敛均衡,为后续扩展和计算验证奠定了基础。

英文摘要

Classical Game Theory underpins much of AI and multiagent research, but hybrid Human AI systems require a framework in which execution authority can alternate within a digital environment. We introduce NeoGame Theory, an extension of classical Game Theory for hybrid Human AI coalitions operating under Virtual Nature, the algorithmic analogue of classical (physical) Nature. The framework combines a lexicographic coalition utility with a delegation rule based on the Jensen-Shannon divergence between Human and AI policies. Two thresholds define agreement, contextual, and disagreement regions. In the contextual region, execution follows a scenario specific rule. Apart from the theory, in this paper we develop the first regime, Human arbitration, in which the AI learns by observation and frequency matching while the Human retains final execution authority. We establish the axiomatic basis of the framework and characterize a frequency convergence equilibrium, providing the foundation for later extensions and computational validation.

2606.11284 2026-06-11 cs.MA cs.GT cs.LG 新提交

Phi-Actor-Critic: Steering General-Sum Games to Pareto-Efficient Correlated Equilibria

Phi-Actor-Critic: 引导一般和博弈走向帕累托高效关联均衡

Wongyu Lee, Francesco Lelli, Omran Ayoub, Massimo Tornatore

AI总结 提出Φ-Actor-Critic框架,通过交换遗憾最小化引导多智能体学习向高社会福利的关联均衡收敛,并采用集中式注意力批评家高效估计反事实遗憾,结合拉格朗日均衡选择机制优化社会福利。

详情
Comments
Accepted to IJCAI 2026
AI中文摘要

现实世界的多智能体系统,从交通协调到资源分配,通常被建模为一般和博弈,其中个体激励与集体福利相冲突。在这些设定中,核心挑战不仅是找到均衡,而是在许多次优纳什均衡中选择社会期望的结果。标准的深度多智能体强化学习(MARL)方法难以解决这个问题,因为价值分解方法受单调性假设约束,而策略梯度方法往往收敛到稳定但社会效率低下的均衡。为了解决这一限制,我们提出了Φ-Actor-Critic(Φ-AC),一个利用交换遗憾最小化引导学习向高福利关联均衡(CE)收敛的框架。为了使反事实遗憾估计在深度MARL中易于处理,Φ-AC采用了一个集中式注意力批评家,在单次前向传播中预测向量值遗憾,避免了计算昂贵的反事实模拟。我们进一步引入了一个基于拉格朗日的均衡选择机制,通过遗憾约束优化社会福利同时确保稳定性。在矩阵博弈、多智能体粒子环境(MPE)和Melting Pot Harvest场景上的实验表明,Φ-AC在多样的混合动机设定中学习到高效且稳定的协调策略,同时保持高集体回报和竞争公平性。

英文摘要

Real-world multi-agent systems, from traffic coordination to resource allocation, are often modeled as general-sum games where individual incentives conflict with collective welfare. In these settings, the central challenge is not merely finding an equilibrium, but selecting socially desirable outcomes among many suboptimal Nash equilibria. Standard deep multi-agent reinforcement learning (MARL) methods struggle with this problem, as value-decomposition approaches are constrained by monotonicity assumptions and policy-gradient methods often converge to stable but socially inefficient equilibria. To address this limitation, we propose $\Phi$-Actor-Critic ($\Phi$-AC), a framework that leverages swap regret minimization to steer learning toward high-welfare correlated equilibria (CE). To make counterfactual regret estimation tractable in deep MARL, $\Phi$-AC employs a centralized attention critic that predicts vector-valued regrets in a single forward pass, avoiding computationally expensive counterfactual simulations. We further introduce a Lagrangian-based equilibrium selection mechanism that optimizes social welfare while enforcing stability through regret constraints. Experiments on matrix games, Multi-Agent Particle Environments (MPE), and the Melting Pot Harvest scenario demonstrate that $\Phi$-AC learns efficient and stable coordination strategies across diverse mixed-motive settings while maintaining high collective return and competitive fairness.

2606.11242 2026-06-11 cs.GT 新提交

Game-Theoretic Foundations of Competition for Conscious Access

意识访问竞争的游戏理论基础

Efthyvoulos Drousiotis, Paul Spirakis, Sotiris Nikoletseas

AI总结 本文建立了一个博弈论模型,其中内部模块通过选择放大努力竞争稀缺的广播槽,分析了均衡存在性、捕获条件、计算效率及机制局限性。

详情
AI中文摘要

人类大脑中的意识访问通常被描述为候选表征之间竞争的结果,但这种竞争往往停留在机制或隐喻层面,而非作为战略分配问题进行分析。我们引入了一种访问竞赛,其中内部模块通过选择代价高昂的放大努力来竞争稀缺的广播槽。访问通过平滑的概率规则分配,使模型能够在扩散选择和赢家通吃竞争之间进行插值。我们在标准凸性和有界收益假设下建立了纯策略均衡存在性,并利用对角严格凹性给出了唯一性的充分条件。然后,我们分析了两个模块情况下的捕获,对于二次成本,我们推导出竞争强度的尖锐阈值,超过该阈值就会发生捕获。对于强凸成本,我们根据低价值模块的成本调整放大优势证明了捕获的充要条件。在保证唯一性的相同曲率主导条件下,我们证明了广义M模块访问竞赛的唯一纯纳什均衡可以通过投影伪梯度动力学高效逼近,且对所需精度具有对数依赖性。最后,我们证明了单槽访问机制的不可能性定理:精确的赢家通吃效率与对微小分数扰动的鲁棒性不相容。因此,平滑概率访问规则不仅便于分析,而且具有结构动机。这些结果为研究意识访问竞争提供了博弈论基础,在统一的形式模型下连接了均衡分析、捕获、计算和机制层面的局限性。

英文摘要

Conscious access in the human brain is often described as the outcome of a competition among candidate representations, but this competition is usually left at the level of mechanism or metaphor rather than analyzed as a strategic allocation problem. We introduce an access contest in which internal modules compete for a scarce broadcast slot by choosing a costly amplification effort. Access is allocated by a smooth probabilistic rule, allowing the model to interpolate between diffuse selection and winner-take-all competition. We establish pure-strategy equilibrium existence under standard convexity and bounded-benefit assumptions, and give sufficient conditions for uniqueness using diagonal strict concavity. We then analyze capture in the two-module case, and for quadratic costs, we derive a sharp threshold in the competition intensity above which capture occurs. For strongly convex costs, we prove an if-and-only-if capture criterion in terms of the cost-adjusted amplification advantage of the lower-value module. Under the same curvature-dominance condition that guarantees uniqueness, we show that the unique pure Nash equilibrium of the general \(M\)-module access contest can be approximated efficiently by projected pseudo-gradient dynamics, with logarithmic dependence on the desired accuracy. Finally, we prove an impossibility theorem for single-slot access mechanisms. Exact winner-take-all efficiency is incompatible with robustness to small score perturbations. Thus, smooth probabilistic access rules are not merely analytically convenient, but structurally motivated. These results provide a game-theoretic foundation for studying competition for conscious access, connecting equilibrium analysis, capture, computation, and mechanism-level limitations under a common formal model.

2606.11193 2026-06-11 cs.GT math.PR 新提交

Approximation Properties of Evolutionary Dynamics in Continuous-Time Finite State Space Games

连续时间有限状态空间博弈中进化动力学的逼近性质

Pietro Grassi

AI总结 研究连续时间有限状态空间博弈中有限种群随机进化动力学向确定性平均场极限的收敛性,证明了平均场模型解的存在唯一性、Nash均衡的逼近以及经验分布的概率收敛,数值模拟验证了O(N^{-1/2})收敛率。

详情
Comments
Bachelor's project
AI中文摘要

本论文研究连续时间有限状态空间博弈中有限种群随机进化动力学向确定性平均场极限的收敛性。我们首先为具有单个正递归类的马尔可夫链发展了精细的遍历定理,保证了唯一不变分布的存在以及时间平均的几乎必然收敛。接下来,我们证明由Lipschitz连续常微分方程组描述的平均场模型存在唯一解,该解连续依赖于初始条件,并且构成固定策略下经验分布的几乎必然极限。此外,我们证明平均场博弈的每个混合平稳纳什均衡都能被相应的$N$人博弈的纳什均衡在误差$\epsilon$内逼近,只要$N$足够大。最后,通过Kurtz定理,我们证明经验状态-策略分布依概率收敛到平均场轨迹。在MATLAB中进行的数值模拟验证了两种模型在不同种群规模下理论上的$\mathcal{O}(N^{-1/2})$收敛率。

英文摘要

This thesis studies the convergence of finite-population stochastic evolutionary dynamics to their deterministic mean-field limit in continuous-time finite state space games. We first develop refined ergodic theorems for Markov chains with a single positive-recurrent class, guaranteeing the existence of a unique invariant distribution and almost-sure convergence of time averages. Next, we prove that the mean-field model, described by a system of Lipschitz-continuous ordinary differential equations, admits a unique solution that depends continuously on its initial condition and that constitutes the almost-sure limit for the empirical distributions with fixed policy. Furthermore, we show that every Mixed Stationary Nash Equilibrium of the mean-field game is approximated by a Nash equilibrium of the corresponding $N$-player game within an error $\epsilon$ for sufficiently large $N$. We finally demonstrate, by Kurtz's theorem, that the empirical state-policy distribution converges in probability to the mean-field trajectory. Numerical simulations conducted in MATLAB confirm the theoretical $\mathcal{O}(N^{-1/2})$ convergence rate in both models across a range of population sizes.

2606.01963 2026-06-11 cs.GT cs.IT math.PR 版本更新

Improved Amenability Bounds for Local Coordination Games

局部协调博弈的顺应性界改进

Ron Peretz, Dean Kraizberg

AI总结 通过引入与玩家局部输出相关的互信息博弈的Shapley值,改进了局部协调博弈中低效率与图顺应性之间的定量关系,证明了平均分歧不超过ε时图是(O(ε log(1/ε)), r)-顺应性的。

详情
AI中文摘要

我们研究有限社交网络上的局部纯协调博弈,延续Hutchcroft、Rospuskova和Tamuz的框架。他们表明,局部协调中的低效率迫使底层图是顺应性的,且在顺应性参数上有平方根损失。我们在二元无偏设置中改进了这一损失。利用与玩家局部输出相关的互信息博弈的Shapley值,我们证明如果平均分歧最多为ε,则该图是(O(ε log(1/ε)), r)-顺应性的。这给出了局部协调与图顺应性之间更尖锐的定量逆命题。

英文摘要

We study local pure coordination games on finite social networks, continuing the framework of Hutchcroft, Rospuskova, and Tamuz. They showed that low inefficiency in local coordination forces the underlying graph to be amenable, with a square-root loss in the amenability parameter. We improve this loss in the binary unbiased setting. Using Shapley values of a mutual-information game associated with the players' local outputs, we prove that if the average disagreement is at most $\varepsilon$, then the graph is $(O(\varepsilon\log(1/\varepsilon)),r)$-amenable. This gives a sharper quantitative converse between local coordination and graph amenability.

2605.12191 2026-06-11 cs.GT math.PR 版本更新

Sure-almost-sure and Sure-limit-sure Window Mean Payoff in Markov Decision Processes

马尔可夫决策过程中的Sure-almost-sure和Sure-limit-sure窗口平均收益

Pranshu Gaba, Shibashis Guha

AI总结 本文研究了马尔可夫决策过程中的窗口平均收益问题,解决sure-almost-sure和sure-limit-sure问题,分析了固定和有界窗口长度下的计算复杂度及策略记忆需求。

详情
Comments
42 pages, 10 figures. Minor corrections
AI中文摘要

给定有理数α和β,马尔可夫决策过程(MDP)中量化目标φ的sure-almost-sure问题询问是否可以同时确保所有MDP的运行结果具有φ值至少为α(即sure α满足)且以概率1运行结果具有φ值至少为β(即almost-sure β满足)。sure-limit-sure问题询问是否对于所有ε>0,可以同时确保所有运行结果具有φ值至少为α且以至少1-ε的概率运行结果具有φ值至少为β。此外,如果同时满足目标的可能性存在,则希望构造策略(对于sure-almost-sure)或策略族(对于sure-limit-sure)来实现这一目标。本文解决了窗口平均收益目标的sure-almost-sure和sure-limit-sure问题。窗口平均收益目标通过要求有限窗口滑过无限运行的平均收益大于给定阈值来加强标准平均收益目标。我们研究了窗口平均收益的两种变体:在固定变体中,窗口长度ℓ是给定的,而在有界变体中,长度未给定但要求在整个运行中保持有界。我们证明,在固定变体中(如果ℓ以 unary 表示),sure-almost-sure 问题和 sure-limit-sure 问题均在 P 中,而在有界变体中,两者均在 NP ∩ coNP 中,与单独考虑这些目标的 sure 满足和 almost-sure 满足的计算复杂度相匹配。我们还给出了所有考虑问题的获胜策略的记忆需求界限。

英文摘要

Given rationals $\alpha$ and $\beta$, the sure-almost-sure problem for a threshold Boolean objective $\varphi$ in a Markov decision process (MDP) asks if one can simultaneously ensure that all outcomes of the MDP have $\varphi$-value at least $\alpha$ (i.e. sure $\alpha$ satisfaction) and with probability $1$ the outcome has $\varphi$-value at least $\beta$ (i.e. almost-sure $\beta$ satisfaction). The sure-limit-sure problem asks if for all $\varepsilon > 0$ one can simultaneously ensure that all outcomes have $\varphi$-value at least $\alpha$ and with probability at least $1 - \varepsilon$ the outcome has $\varphi$-value at least $\beta$. Moreover, if simultaneous satisfaction of objectives is possible, then one would also like to construct a strategy (for sure-almost-sure) or a family of strategies (for sure-limit-sure) that achieves this. In this paper, we solve the sure-almost-sure and sure-limit-sure problems for window mean-payoff objectives. The window mean-payoff objective strengthens the standard mean-payoff objective by requiring that eventually, from every point in the infinite run, the average payoff becomes greater than a given threshold within a finite window length. We study two variants of window mean payoff: in the fixed variant, the window length $\ell$ is given, while in the bounded variant, the length is not given but is required to be bounded throughout the run. We show that the sure-almost-sure problem and the sure-limit-sure problem are both in P for the fixed variant (if $\ell$ is given in unary) and are both in NP $\cap$ coNP for the bounded variant, matching the computational complexity of sure satisfaction and almost-sure satisfaction when considered separately for these objectives. We also give bounds for the memory requirement of winning strategies for all considered problems.

2603.14867 2026-06-11 cs.LG cs.AI cs.GT cs.MA 版本更新

Sample-Efficient Hypergradient Estimation for Decentralized Bi-Level Reinforcement Learning

用于去中心化双层强化学习的样本高效超梯度估计

Mikoto Kudo, Takumi Tanabe, Akifumi Wachi, Youhei Akimoto

AI总结 针对去中心化双层强化学习中领导者无法干预跟随者优化过程的问题,提出基于玻尔兹曼协方差技巧的超梯度估计方法,实现高维决策空间下的样本高效优化,并首次应用于双人马尔可夫博弈。

详情
Comments
29 pages. Extended version of the paper accepted to ICAPS 2026
AI中文摘要

许多战略决策问题,例如仓库机器人的环境设计,可以自然地表述为双层强化学习,其中领导者代理优化其目标,而跟随者解决一个以领导者决策为条件的马尔可夫决策过程。在许多情况下,当领导者无法干预跟随者的优化过程时,会出现一个基本挑战;它只能观察优化结果。我们通过推导领导者目标的超梯度(即考虑跟随者最优策略变化的领导者策略梯度)来解决这种去中心化设置。与先前基于超梯度的方法不同,这些方法需要大量数据来重复访问状态,或者依赖于梯度估计器,其复杂度可能随着领导者决策空间的高维性而显著增加,我们利用玻尔兹曼协方差技巧推导出一种替代的超梯度公式。这使得仅从交互样本中就能进行高效的超梯度估计,即使领导者的决策空间是高维的。此外,据我们所知,这是第一种能够在去中心化设置中实现基于超梯度的优化的双人马尔可夫博弈方法。实验突出了超梯度更新的影响,并展示了我们的方法在离散和连续状态任务中的有效性。

英文摘要

Many strategic decision-making problems, such as environment design for warehouse robots, can be naturally formulated as bi-level reinforcement learning (RL), where a leader agent optimizes its objective while a follower solves a Markov decision process (MDP) conditioned on the leader's decisions. In many situations, a fundamental challenge arises when the leader cannot intervene in the follower's optimization process; it can only observe the optimization outcome. We address this decentralized setting by deriving the hypergradient of the leader's objective, i.e., the gradient of the leader's strategy that accounts for changes in the follower's optimal policy. Unlike prior hypergradient-based methods that require extensive data for repeated state visits or rely on gradient estimators whose complexity can increase substantially with the high-dimensional leader's decision space, we leverage the Boltzmann covariance trick to derive an alternative hypergradient formulation. This enables efficient hypergradient estimation solely from interaction samples, even when the leader's decision space is high-dimensional. Additionally, to our knowledge, this is the first method that enables hypergradient-based optimization for 2-player Markov games in decentralized settings. Experiments highlight the impact of hypergradient updates and demonstrate our method's effectiveness in both discrete and continuous state tasks.

2603.25979 2026-06-11 cs.GT eess.SY 版本更新

Move Over, Prisoner's Dilemma: Colonel Blotto has arrived

让位吧,囚徒困境:Colonel Blotto 来了

Keith Paarporn, Jason R. Marden

AI总结 本文介绍 Colonel Blotto 博弈框架,综述关键分析与计算结果,并展示其在网络安全、网络防御和多智能体系统中的应用,重点探讨相互依赖的对抗目标、替代获胜规则和多智能体竞争环境三个研究方向。

详情
AI中文摘要

囚徒困境、零和博弈、LQR 团队问题和微分博弈几十年来塑造了控制领域的博弈论,但该领域最紧迫的对抗性挑战需要一个更丰富的框架,其名为 Colonel Blotto。从网络安全防御到基础设施保护,战略对抗约束是控制系统中的基本考虑因素。Colonel Blotto 博弈尽管与这些应用直接相关,但在控制界中相对于其他博弈论方法仍未被充分利用。本文旨在为控制界弥合这一差距。实际上,过去二十年内的理论进展激发了人们重新燃起的兴趣,并使其能够应用于多个领域。在本文中,我们介绍 Colonel Blotto 框架,综述关键分析和计算结果,并展示涵盖网络安全、网络防御和多智能体系统的问题如何自然地适合这一结构。深入探讨了三个研究方向:捕捉网络脆弱性的相互依赖的对抗目标、模拟部分奖励和结构不对称的替代获胜规则,以及涉及联盟形成和战略让步的多智能体竞争环境。综合来看,这些方向揭示了一个既实用又足够丰富以捕捉对抗性资源分配中固有战略复杂性的框架。

英文摘要

The Prisoner's Dilemma, zero-sum games, LQR team problems, and differential games have shaped game theory in controls for decades, but the field's most pressing adversarial challenges demand a richer framework, and its name is Colonel Blotto. Strategic adversarial constraints represent a fundamental consideration in control systems, from cybersecurity defense to infrastructure protection. Colonel Blotto games, despite their direct relevance to such applications, remain underutilized in the controls community relative to other game-theoretic approaches. This article aims to close that gap for the controls community. Indeed, theoretical advances within the last two decades have spurred a resurgence of interest and enabled their applications across several domains. In this article, we introduce the Colonel Blotto framework, survey key analytical and computational results, and demonstrate how problems spanning cybersecurity, network defense, and multi-agent systems fit naturally within this structure. Three research directions are examined in depth: interdependent contest objectives that capture networked vulnerabilities, alternate winning rules that model partial rewards and structural asymmetries, and multi-agent competitive environments involving coalition formation and strategic concessions. Taken together, these directions reveal a framework that is both practically deployable and rich enough to capture the strategic complexity inherent in adversarial resource allocation.

2602.10456 2026-06-11 cs.GT econ.TH math.OC 版本更新

Informal and Privatized Transit: Incentives, Efficiency and Coordination

非正式与私有化公共交通:激励、效率与协调

Devansh Jalota, Matthew Tsao

AI总结 本文通过博弈论框架研究非正式公交系统中司机利润最大化行为导致的效率损失,并提出交叉补贴和票价优化两种机制来缓解低效。

详情
AI中文摘要

非正式和私有化的公交服务,如小巴和共享自动人力车,是大型城市日常出行的重要组成部分,在正规公共交通不足且其他选择难以负担的情况下提供经济实惠的通勤。这些系统的一个显著特征是它们的去中心化组织,司机根据乘客需求提供服务并赚取收入。虽然这种结构有助于填补关键的出行缺口,但当利润驱动的司机路线选择与系统范围的出行目标不一致时,也可能产生低效的服务模式。我们开发了一个可解析的博弈论框架,研究具有固定路线菜单的非正式和私有化公交系统中的激励问题,量化去中心化司机路线选择导致的效率损失,并设计激励机制以减轻这些低效。在此框架中,利润最大化的非正式运营商(司机)决定在何处提供服务,而成本最小化的通勤者(乘客)决定是否使用这些服务。我们建立了严格的价格无政府状态界限,表明去中心化、利润最大化的司机行为可能导致累计司机利润和乘客需求服务量的有界但显著的损失,并且这些损失可以通过有针对性的干预措施来缓解:预算平衡的交叉补贴(通过路线特定的通行费/补贴来塑造司机收益)和票价优化(通过中央监管的路线级票价改变乘客需求和司机利润)。最后,基于印度Nalasopara真实非正式公交系统的数值实验进一步验证了这些发现。

英文摘要

Informal and privatized transit services, such as minibuses and shared auto-rickshaws, are integral to daily travel in large urban metropolises, providing affordable commutes where formal public transport is inadequate and other options are unaffordable. A defining feature of these systems is their decentralized organization, with drivers providing service in response to rider demand and earning opportunities. While this structure helps fill critical mobility gaps, it can also generate inefficient service patterns when profit-driven driver route choices do not align with system-wide mobility goals. We develop an analytically tractable game-theoretic framework to study incentives underlying informal and privatized transit systems with a fixed menu of routes, quantify efficiency losses from decentralized driver route choice, and design incentive mechanisms to mitigate these inefficiencies. Here, profit-maximizing informal operators (drivers) decide where to provide service and cost-minimizing commuters (riders) decide whether to use these services. Within this framework, we establish tight price of anarchy bounds showing that decentralized, profit-maximizing driver behavior can lead to bounded yet substantial losses in cumulative driver profit and rider demand served and that these losses can be mitigated through targeted interventions: budget-balanced cross-subsidization, which uses route-specific tolls/subsidies to shape driver payoffs, and fare optimization, which changes rider demand and driver margins through centrally regulated route-level fares. Finally, numerical experiments based on a real-world informal transit system in Nalasopara, India, reinforce these findings.

2602.05835 2026-06-11 cs.GT 版本更新

Bandit Social Learning with Exploration Episodes

带有探索回合的匪徒社会学习

Kiarash Banihashem, Natalie Collina, Aleksandrs Slivkins

AI总结 研究自利代理通过多臂匪徒协议进行社会学习时,尽管个体有探索动机,但集体探索失败导致贝叶斯遗憾线性增长,表明即使存在有机探索也需要外部驱动。

详情
Comments
Appears in ICML 2026
AI中文摘要

我们研究了一种简化的社会学习动态,其中自利代理集体遵循一个简单的多臂匪徒协议。每个代理控制一个“回合”:一系列连续的决策。激励应用包括用户反复与AI交互,或反复在市场上购物。虽然代理有动机在其各自的回合内进行探索,但我们表明总体探索失败:例如,其贝叶斯遗憾随时间线性增长。事实上,这种失败是(非常)典型的情况,而不仅仅是最坏情况。即使代理的每回合效用是每轮结果的某个固定函数(例如,$\min$或$\max$,而不仅仅是总和),这一结论仍然成立。因此,即使当一定量的探索有机发生时,外部驱动的探索仍然是必要的。

英文摘要

We study a stylized social learning dynamics where self-interested agents collectively follow a simple multi-armed bandit protocol. Each agent controls an ``episode": a short sequence of consecutive decisions. Motivating applications include users repeatedly interacting with an AI, or repeatedly shopping at a marketplace. While agents are incentivized to explore within their respective episodes, we show that the aggregate exploration fails: e.g., its Bayesian regret grows linearly over time. In fact, such failure is a (very) typical case, not just a worst-case scenario. This conclusion persists even if an agent's per-episode utility is some fixed function of the per-round outcomes: e.g., $\min$ or $\max$, not just the sum. Thus, externally driven exploration is needed even when some amount of exploration happens organically.

2601.15580 2026-06-11 econ.TH cs.GT 版本更新

Screening for Choice Sets

选择集的筛选

Tan Gan, Yingkai Li

AI总结 研究代理人私下知道可行行动或技术集,仅向委托人披露子集的筛选问题,通过包含序假设刻画最优机制,并应用于说服管理、行动激励和生产技术激励。

详情
AI中文摘要

我们研究了一个筛选问题,其中代理人私下知道哪些行动或技术是可行的,并且只能向委托人披露一个子集。一旦披露,可行选项是可验证的,其收益后果是公开已知的,因此私人信息涉及可行性而非收益,误报直接限制委托人的选择而非扭曲其信念。假设可行集按包含关系排序,我们建立了最优机制的简单刻画,其中委托人要么表现得好像没有不对称信息,要么局部地对更好的提议不提供奖励。我们推导了比较静态分析,并将该框架应用于说服管理、行动激励和生产技术激励等场景。

英文摘要

We study a screening problem in which an agent privately knows which actions or technologies are feasible and can disclose only a subset to a principal. Once disclosed, feasible options are verifiable and their payoff consequences are publicly known, so private information concerns feasibility rather than payoffs, misreporting restricts the principal's choices directly rather than distorting her beliefs. Assuming feasible sets are ordered by inclusion, we establish a simple characterization of the optimal mechanism, where the principal either behaves as if there is no asymmetric information or locally provides no reward for better proposals. We derive comparative statics and illustrate the framework in applications to managing persuasion, action elicitation, and production-technology elicitation.

2402.13378 2026-06-11 econ.TH cs.GT 版本更新

Stable Matching as Transport: a Welfarist Perspective on Market Design

作为运输的稳定匹配:市场设计的福利视角

Federico Echenique, Joseph Root, Fedor Sandomirskiy

AI总结 将偏好一致的市场匹配与最优运输理论联系起来,证明稳定性、效率和公平性是一族参数化最优运输问题的解,揭示了匹配的结构性质及目标间的权衡。

详情
AI中文摘要

本文将偏好一致的市场匹配与最优运输理论联系起来。我们证明稳定性、效率和公平性是一族参数化最优运输问题的解。该参数刻画了规划者对不平等的态度。这种联系揭示了匹配的结构性质以及目标之间的权衡,展示了稳定性如何导致福利不平等,即使在相似主体之间也是如此。我们的模型捕捉了空间市场、学校选择和拼车等背景下的供需失衡。我们还表明,具有异质性偏好的大型市场可以很好地由一致偏好近似,从而扩展了我们结果的适用性。

英文摘要

This paper links matching markets with aligned preferences to optimal transport theory. We show that stability, efficiency, and fairness emerge as solutions to a parametric family of optimal transport problems. The parameter indexes a planner's attitude towards inequality. This link offers insights into structural properties of matchings and trade-offs between objectives, showing how stability can lead to welfare inequalities, even among similar agents. Our model captures supply-demand imbalances in contexts like spatial markets, school choice, and ride-sharing. We also show that large markets with idiosyncratic preferences can be well approximated by aligned preferences, expanding the applicability of our results.