arXivDaily arXiv每日学术速递 周一至周五更新
重置
cs.MA多智能体19
2606.12281 2026-06-11 cs.MA cs.AI cs.LG 新提交

CCKS: Consensus-based Communication and Knowledge Sharing

CCKS:基于共识的通信与知识共享

Jinyuan Zu, Xiaowei Lv, Yongcai Wang, Deying Li, Yunjun Han, Wenping Chen, Fengyi Zhang, Naiqi Wu

AI总结 针对多智能体强化学习中动作建议过度依赖教师指导的问题,提出基于共识的通信与知识共享框架,通过对比学习构建共识模型,平衡探索与学习,提升合作效率与性能。

详情
AI中文摘要

在分布式训练和分布式执行(DTDE)的协作多智能体强化学习(MARL)中,基于动作建议的知识共享促进了智能体间的可解释和可扩展合作。然而,当前的动作建议方法往往过于遵循教师的指导,而未评估师生兼容性,导致过度建议、稳定性欠佳和性能下降。为克服这些挑战,本文提出了一种基于共识的通信与知识共享(CCKS)框架,该框架允许智能体基于共识衍生的约束采纳建议,并更智能地遵循教师指令。该机制使智能体能够平衡探索与向经验丰富的教师学习,从而提升整体性能。关键在于共识模型的构建,为此我们提出在智能体训练阶段利用对比学习基于局部观测构建共识模型。在动作选择中,智能体根据共识和共享知识对动作进行评分和选择。CCKS设计为即插即用解决方案,可无缝集成到现有DTDE算法中。在Google Research Football环境和复杂的星际争霸II多智能体挑战中进行的实验表明,与当前的DTDE基线相比,集成CCKS显著提高了合作效率、学习速度和整体性能。代码可从此https URL获取。

英文摘要

In Decentralized Training and Decentralized Execution (DTDE) for cooperative Multi-Agent Reinforcement Learning (MARL), action-advising-based knowledge sharing promotes interpretable and scalable cooperation among agents. However, current action advising approaches often adhere too much to the teacher's guidance without evaluating teacher-student compatibility, which causes excessive advising, suboptimal stability, and degraded performance. To overcome these challenges, this paper presents a Consensus-based Communication and Knowledge Sharing (CCKS) framework, which allows agents to adopt recommendations based on consensus-derived constraints and to follow the teacher's instructions more smartly. This mechanism enables agents to balance exploration and learning from experienced teachers, improving overall performance. The key is the consensus model construction, for which we propose to employ contrastive learning to construct consensus models based on local observations in the agents' training phase. In action selection, agents score and choose actions based on consensus and shared knowledge. Designed as a plug-and-play solution, CCKS integrates seamlessly with existing DTDE algorithms. Experiments conducted in the Google Research Football environment and the complex StarCraft II Multi-Agent Challenge demonstrate that the integration with CCKS significantly improves cooperation efficiency, learning speed, and overall performance compared with current DTDE baselines. The code is available at this https URL.

2606.12065 2026-06-11 cs.AI cs.MA 新提交

Automating Geometry-Intensive Compliance Checking in BIM: Graph-Based Semantic Reasoning Framework

BIM中几何密集型合规检查自动化:基于图的语义推理框架

Zixuan Xiao, Pei Troh Koh, Jun Ma, Jack C.P. Cheng

AI总结 针对BIM中几何密集型法规自动检查的语义鸿沟问题,提出SGR-BIM图驱动推理框架,通过跨模态知识图谱实现可解释推理,在679个消防规范查询上达到84.3%准确率,较基线提升8.6%。

详情
AI中文摘要

自动化几何密集型法规的合规检查仍然是建筑信息模型(BIM)中的一个重大技术瓶颈,主要原因是高层级法规逻辑与结构化IFC数据之间的语义差异。现有方法通常依赖于静态规则模板,难以遍历多跳推理链或解决跨多个建筑实体的潜在空间依赖关系。为应对这些挑战,提出了一种面向建筑信息模型的空间几何推理系统(SGR-BIM),作为一个集成的图驱动推理框架。SGR-BIM动态构建跨模态知识图谱,对齐用户意图、法规语义和BIM几何,无需硬编码即可实现可解释推理。在来自消防规范的679个专家验证查询上验证,该框架达到了84.3%的准确率,比增强工具的单智能体基线提高了8.6%。本研究提供了一种基于图的语义推理范式,增强了建筑、工程和施工(AEC)行业中自动化几何合规检查工作流的透明度和灵活性。

英文摘要

Automating compliance check for geometry-intensive regulations remains a significant technical bottleneck in Building Information Modeling (BIM), primarily due to the semantic disparity between high-level regulatory logic and structured IFC data. Existing methods, often reliant on static rule templates, struggle to traverse multi-hop reasoning chains or resolve latent spatial dependencies across multiple building entities. To address these challenges, a Spatial-Geometric Reasoning System for Building Information Modeling (SGR-BIM) is proposed as an integrative graph-driven reasoning framework. SGR-BIM dynamically constructs a cross-modal knowledge graph that aligns user intent, regulatory semantics, and BIM geometry, enabling interpretable reasoning without rigid hard-coding. Validated on 679 expert-verified queries from fire safety codes, the framework achieves 84.3% accuracy, representing an 8.6% improvement over enhanced-tool single-agent baselines. This research provides a graph-based semantic reasoning paradigm, enhancing the transparency and flexibility of automated geometric compliance check workflows in the Architecture, Engineering, and Construction (AEC) industry.

2606.11692 2026-06-11 cs.CY cs.MA cs.SI 新提交

Evaluation of Alternative-Based Information Systems for Deliberative Polling using an Agentic Simulator

基于智能体模拟器的审议式投票中替代性信息系统评估

Rwaida Alssadi, Khulud Alawaji, Balaji Kasula, Muntaser Syed, Badria Alfurhood, Markus Zanker, Marius Silaghi

AI总结 提出基于LLM的智能体双极论证模拟器(ABAS),通过覆盖率和语料多样性评估审议式投票中推荐机制的有效性,并测试了对抗性投票攻击下的鲁棒性。

详情
AI中文摘要

审议式投票旨在通过让股东在投票前接触广泛论点来改善集体决策。然而,确保每个选民遇到理由空间的代表性样本(覆盖问题)仍然是一个开放的挑战,特别是在大规模和对抗性或策略性动机的选民群体中。本文介绍了一种使用基于LLM的智能体双极论证模拟器(ABAS)评估解决方案的方法,该模拟器基于一个将投票形式化为六元组<Jend, Jopp, Ratt, Renh, VA, VR>(包含支持与反对理由、攻击与增强关系、股东权重和关系权重)的框架。ABAS模拟N个自主股东智能体,每个智能体根据[-1,1]内的期望分布分配潜在意见,依次投票、选择或撰写理由,并可选择提交论证图链接。该模拟器实现推荐机制,根据可观察的支持质量对现有理由进行排序。它通过覆盖率(即每个股东收到的K条推荐中代表语料库理由标签集的比例)来评估机制的成功,作为NP难子集理由问题的一个解决方案。报告的实验描述了创造力率(pown)、推荐大小(K)、论证密度(plinks)和人口规模(N)如何影响覆盖率和语料库多样性。在一个经过身份验证的选民群体中(Sybil攻击不可能,只有关系图可被操纵),我们通过协调策略性投票攻击对评分进行压力测试:标签洪泛攻击导致覆盖率崩溃,而通过反向PageRank规则的作者计数关系加权比均匀权重显著更好地抵抗了洪泛攻击。

英文摘要

Deliberative polling promises to improve collective decision-making by exposing shareholders to a broad range of arguments before they vote. Yet ensuring that every voter encounters a representative sample of the reason space, the coverage problem, remains an open challenge, particularly at scale and in adversarial or strategically motivated electorates. This paper introduces a way of evaluating solutions using the LLM-based Agentic Bipolar Argumentation Simulator, grounded in a framework which formalises a poll as a six-tuple <Jend, Jopp, Ratt, Renh, VA, VR> of endorsing and opposing justifications, attack and enhance relations, and shareholder- and relation-weights. ABAS simulates N autonomous shareholder agents, each assigned a latent opinion according to desired distributions in [-1, 1], who sequentially vote, choose or author justifications, and optionally submit argumentation-graph links. The simulator implements recommendations that rank existing justifications by their observable endorsement mass. It evaluates the mechanism's success by coverage, namely the fraction of the corpus reason-tag set represented in the K recommendations presented to each shareholder, as a solution to the NP-hard Subsuming Justification Problem. Reported experiments characterise how creativity rate (pown), recommendation size (K), argumentation density (plinks), and population size (N) affect coverage and corpus diversity. In an authenticated electorate where Sybil attacks are impossible and only the relation graph is gameable, we stress-test the scoring with coordinated strategic voting attacks: a tag-flood attack collapses coverage, while author-count relation weighting through a reversed-PageRank rule resists the flood markedly better than uniform weights.

2606.11632 2026-06-11 cs.CR cs.AI cs.DC cs.MA 新提交

Sovereign Assurance Boundary: Certificate-Bound Admission for Agentic Infrastructure

主权保证边界:面向智能体基础设施的证书绑定准入机制

Jun He, Deying Yu

AI总结 针对智能体基础设施中非确定性推理系统对生产资源的高风险操作,提出主权保证边界(SAB),通过证书绑定的运行时准入层,将代理提案编译为执行合约并绑定加密证据,实现可验证、可撤销的授权控制。

详情
Comments
12 pages, 1 figure, 13 tables
AI中文摘要

智能体基础设施引入了一个关键的控制平面授权问题:非确定性推理系统可以对生产资源提出高风险变更,但现有的安全机制——如身份与访问管理(IAM)、策略引擎、共识协议和审计日志——要么强制执行静态的、上下文无关的权限,要么仅在执行后记录操作。本文介绍了主权保证边界(SAB),一种用于自主执行权限的证书绑定运行时准入层。SAB在保证气闸处拦截代理提案,将其编译为类型化执行合约$C$,并将这些合约绑定到加密证据摘要$H(E)$和策略版本。然后,合约通过后果感知的认证路径进行路由。成功准入后,系统发出一个严格限定于特定执行身份、撤销周期和有效时间窗口的签名主权保证证书($\Omega$)。最后,主权执行代理验证$\Omega$,并在调用基础设施API之前执行新鲜的执行前撤销和漂移检查。我们详细描述了气闸-代理架构,形式化了其准入和撤销不变量,并报告了在Go原型上对2500次准入尝试评估的初步可行性测量。最终,这种代理强制模型防止了自主推理直接改变状态,将委托的执行权限转化为一个可加密验证、证据绑定、可撤销且可重放的运行时工件。

英文摘要

Agentic infrastructure introduces a critical control-plane authorization problem: non-deterministic reasoning systems can propose high-stakes mutations to production resources, yet existing security mechanisms -- such as identity and access management (IAM), policy engines, consensus protocols, and audit logs -- either enforce static, context-unaware permissions or merely record actions post-execution. This paper introduces the Sovereign Assurance Boundary (SAB), a certificate-bound runtime admission layer for autonomous execution authority. SAB intercepts agent proposals at an assurance airlock, compiles them into typed execution contracts $C$, and binds these contracts to cryptographic evidence digests $H(E)$ and policy versions. The contracts are then routed through consequence-aware certification paths. Upon successful admission, the system emits a signed Sovereign Assurance Certificate ($\Omega$) that is strictly scoped to a specific execution identity, revocation epoch, and validity window. Finally, a sovereign execution broker verifies $\Omega$ and performs fresh pre-execution revocation and drift checks before invoking infrastructure APIs. We detail the airlock-broker architecture, formalize its admission and revocation invariants, and report preliminary feasibility measurements from a Go prototype evaluated over 2,500 admission attempts. Ultimately, this broker-enforced model prevents autonomous reasoning from directly mutating state, transforming delegated execution authority into a cryptographically verifiable, evidence-bound, revocable, and replayable runtime artifact.

2606.11284 2026-06-11 cs.MA cs.GT cs.LG 新提交

Phi-Actor-Critic: Steering General-Sum Games to Pareto-Efficient Correlated Equilibria

Phi-Actor-Critic: 引导一般和博弈走向帕累托高效关联均衡

Wongyu Lee, Francesco Lelli, Omran Ayoub, Massimo Tornatore

AI总结 提出Φ-Actor-Critic框架,通过交换遗憾最小化引导多智能体学习向高社会福利的关联均衡收敛,并采用集中式注意力批评家高效估计反事实遗憾,结合拉格朗日均衡选择机制优化社会福利。

详情
Comments
Accepted to IJCAI 2026
AI中文摘要

现实世界的多智能体系统,从交通协调到资源分配,通常被建模为一般和博弈,其中个体激励与集体福利相冲突。在这些设定中,核心挑战不仅是找到均衡,而是在许多次优纳什均衡中选择社会期望的结果。标准的深度多智能体强化学习(MARL)方法难以解决这个问题,因为价值分解方法受单调性假设约束,而策略梯度方法往往收敛到稳定但社会效率低下的均衡。为了解决这一限制,我们提出了Φ-Actor-Critic(Φ-AC),一个利用交换遗憾最小化引导学习向高福利关联均衡(CE)收敛的框架。为了使反事实遗憾估计在深度MARL中易于处理,Φ-AC采用了一个集中式注意力批评家,在单次前向传播中预测向量值遗憾,避免了计算昂贵的反事实模拟。我们进一步引入了一个基于拉格朗日的均衡选择机制,通过遗憾约束优化社会福利同时确保稳定性。在矩阵博弈、多智能体粒子环境(MPE)和Melting Pot Harvest场景上的实验表明,Φ-AC在多样的混合动机设定中学习到高效且稳定的协调策略,同时保持高集体回报和竞争公平性。

英文摘要

Real-world multi-agent systems, from traffic coordination to resource allocation, are often modeled as general-sum games where individual incentives conflict with collective welfare. In these settings, the central challenge is not merely finding an equilibrium, but selecting socially desirable outcomes among many suboptimal Nash equilibria. Standard deep multi-agent reinforcement learning (MARL) methods struggle with this problem, as value-decomposition approaches are constrained by monotonicity assumptions and policy-gradient methods often converge to stable but socially inefficient equilibria. To address this limitation, we propose $\Phi$-Actor-Critic ($\Phi$-AC), a framework that leverages swap regret minimization to steer learning toward high-welfare correlated equilibria (CE). To make counterfactual regret estimation tractable in deep MARL, $\Phi$-AC employs a centralized attention critic that predicts vector-valued regrets in a single forward pass, avoiding computationally expensive counterfactual simulations. We further introduce a Lagrangian-based equilibrium selection mechanism that optimizes social welfare while enforcing stability through regret constraints. Experiments on matrix games, Multi-Agent Particle Environments (MPE), and the Melting Pot Harvest scenario demonstrate that $\Phi$-AC learns efficient and stable coordination strategies across diverse mixed-motive settings while maintaining high collective return and competitive fairness.

2606.11274 2026-06-11 cs.MA cs.LG physics.flu-dyn 新提交

Multi-agent rendezvous in fluid flows via reinforcement learning

基于强化学习的多智能体在流体中的会合

Bocheng Li, Jingran Qiu, Lihao Zhao

AI总结 采用多智能体强化学习(MARL)在涡旋流中开发物理信息会合策略,显著提高会合率,并具有跨涡旋强度、尺度和群体规模的迁移性,通过打破状态-动作图对称性防止智能体被困在分离涡旋中。

详情
AI中文摘要

会合是多智能体系统的一项关键任务,要求智能体协调以在未指定位置相遇。然而,在流体环境中实现这一目标具有挑战性,因为尚不清楚智能体如何利用底层流体运动学来促进收敛。在本研究中,我们采用多智能体强化学习(MARL)方法在涡旋流中开发物理信息会合策略。与智能体向其对应方导航的朴素策略相比,MARL策略显著提高了会合率。MARL策略还表现出跨不同涡旋强度、涡旋尺度和群体规模的可迁移性。通过打破状态-动作图的对称性,MARL策略利用一种非直观的机制,防止智能体被困在分离的涡旋中,从而提高会合成功率。此外,从学习到的策略中提取了一种启发式策略,其性能也优于朴素策略。进一步的理论分析表明,流体变形阻碍了会合过程。大的有限时间李雅普诺夫指数识别出流体效应分离相邻智能体的区域,表明应在弱变形区域规划目标。我们的发现揭示了智能体-流体相互作用在多智能体任务中的重要作用,并突出了MARL在复杂流动环境中探索群体智能的能力。

英文摘要

Rendezvous is a critical task for multi-agent systems, requiring agents to coordinate to meet at an unspecified location. However, achieving this in fluid environments presents a challenge, as it remains unclear how agents can exploit underlying fluid kinematics to facilitate convergence. In this study, we adopt a multi-agent reinforcement learning (MARL) approach to develop physics-informed rendezvous strategies in vortical flows. Compared to a naive strategy, where agents navigate toward their counterparts, MARL strategies significantly improve the rendezvous rate. MARL strategies also show transferability across varying vortex intensities, vortex scales, and swarm sizes. By breaking the symmetry of the state-action map, MARL strategy leverages a non-intuitive mechanism that prevents agents from becoming trapped in separate vortices, thereby enhancing rendezvous success. Additionally, a heuristic strategy is extracted from the learned strategy and also outperforms the naive strategy. Furthermore, a theoretical analysis demonstrates that fluid deformation impedes the rendezvous process. Large finite-time Lyapunov exponents identify where fluid effects separate adjacent agents, suggesting that targets should be planned in weak-deformation regions. Our findings reveal the important role that agent-fluid interactions play in multi-agent tasks and highlight the MARL capability to explore swarm intelligence in complex flow environments.

2605.02411 2026-06-11 cs.AI cs.IR cs.LG cs.MA 版本更新

FitText: Evolving Agent Tool Ecologies via Memetic Retrieval

FitText: 通过模因检索演化智能体工具生态

Kyle Zheng, Han Zhang, Renliang Sun, Chenchen Ye, Wei Wang

AI总结 针对用户任务描述与工具文档间的语义鸿沟,提出FitText框架,将检索嵌入推理循环,通过自然语言伪工具描述迭代优化和模因进化选择,显著提升工具检索性能。

详情
AI中文摘要

用户描述任务的方式与工具文档之间存在语义鸿沟。随着API生态扩展到数万个端点,仅凭初始查询的静态检索无法弥合这一鸿沟:智能体对其所需工具的理解在执行过程中不断演变,但其工具集却保持不变。我们指出,这种检索接口(而非规划)是端到端智能体性能的约束瓶颈,并引入FitText——一个无需训练的框架,通过将检索直接嵌入智能体的推理循环中,使其动态化。FitText将检索视为测试时假设的演化:智能体生成自然语言的伪工具描述(关于所需工具的可修正信念),利用检索反馈迭代优化,并通过随机生成探索多样化的替代方案。模因检索在候选描述上施加进化选择压力,并由避免冗余搜索的工具记忆引导。在ToolRet(三个领域)上,FitText的重构策略在所有基模型上相比静态查询检索将NDCG@5提升了2.7至10.6个点;在StableToolBench(16,464个API)上使用GPT-5.4-mini时,模因检索达到了84.3%的合并通过率,相比静态查询检索绝对提升了26.7个点。

英文摘要

A semantic gap separates how users describe tasks from how tools are documented. As API ecosystems scale to tens of thousands of endpoints, static retrieval from the initial query alone cannot bridge this gap: the agent's understanding of what it needs evolves during execution, but its tool set does not. We identify this retrieval interface, not planning, as the binding constraint on end-to-end agent performance, and introduce FitText, a training-free framework that makes retrieval dynamic by embedding it directly in the agent's reasoning loop. FitText treats retrieval as test-time evolution of hypotheses: the agent generates natural-language pseudo-tool descriptions (revisable beliefs about the tool it needs), refines them iteratively using retrieval feedback, and explores diverse alternatives through stochastic generation. Memetic Retrieval adds evolutionary selection pressure over candidate descriptions, guided by a tool memory that avoids redundant search. On ToolRet (three domains), FitText's reformulation strategies improve NDCG@5 by 2.7 to 10.6 points over static query retrieval across all base models; on StableToolBench (16,464 APIs) with GPT-5.4-mini, Memetic reaches an 84.3% pooled pass rate, a 26.7-point absolute gain over static query retrieval.

2606.11249 2026-06-11 cs.RO cs.LG cs.MA 新提交

MASK: Multi-Agent Semantic K-Scheduling for Risk-Sensitive 6G Robotics

MASK: 面向风险敏感的6G机器人学的多智能体语义K调度

Ahmet Gunhan Aydin, Elif Tugce Ceran

发表机构 * Middle East Technical University(中东技术大学) Aselsan Inc.(阿塞尔桑公司)

AI总结 针对6G机器人协同感知中频谱资源受限的问题,提出多智能体语义K调度(MASK)架构,通过仲裁辅助语义信息门控(A-SIG)机制仅调度语义重要性最高的K个智能体,结合自监督全局编码器和分布策略,在严格带宽限制下实现鲁棒的风险感知协调,性能接近无通信约束基线。

详情
AI中文摘要

实现6G连接机器人学的愿景需要协调高性能协作控制与物理无线信道的刚性频谱限制。在现实的协作感知场景中,频谱资源被量化为有限的物理资源块或正交子载波,使得所有智能体同时传输不可行。为了解决这一问题,我们提出了多智能体语义K调度(MASK),一种控制架构,旨在在严格的瞬时带宽限制下维持鲁棒的风险感知协调。我们引入了仲裁辅助语义信息门控(A-SIG),一种轻量级协调机制,通过基于本地计算的语义重要性分数仅调度前K个智能体来强制执行硬接入约束。通过将这些优先观测聚合为紧凑的潜在状态,自监督全局编码器使得分布策略能够在数据稀疏的情况下减轻尾部风险。我们在多个基准上评估了MASK,证明即使信道接入限制为群体大小的一小部分,其性能也能匹配无通信约束的基线。此外,该框架对数据包擦除具有固有的弹性,验证了语义调度作为资源受限的6G系统的关键使能技术。

英文摘要

Realizing the vision of 6G connected robotics requires reconciling high-performance collaborative control with the rigid spectral limitations of physical wireless channels. In realistic collaborative sensing scenarios, spectral resources are quantized into finite physical resource blocks or orthogonal subcarriers, rendering simultaneous transmission by all agents infeasible. To address this, we propose Multi-Agent Semantic K-Scheduling (MASK), a control architecture designed to sustain robust, risk-aware coordination under strict instantaneous bandwidth caps. We introduce Arbiter-Assisted Semantic Information Gating (A-SIG), a lightweight coordination mechanism that enforces hard access constraints by scheduling only the top-K agents based on locally computed semantic importance scores. By aggregating these prioritized observations into a compact latent state, a self-supervised global encoder enables a distributional policy to mitigate tail risks despite data sparsity. We evaluate MASK across diverse benchmarks, demonstrating that it matches the performance of communication-unconstrained baselines even when channel access is restricted to a small fraction of the swarm size. Furthermore, the framework exhibits inherent resilience to packet erasures, validating semantic scheduling as a critical enabler for resource-constrained 6G systems.

2606.10546 2026-06-11 cs.MA 版本更新

SkillAxe: Sharpening LLM-Authored Agent Skills Through Evaluation-Guided Self-Refinement

SkillAxe: 通过评估引导的自我精炼提升LLM编写的智能体技能

Srishti Gautam, Arjun Radhakrishna, Sumit Gulwani

AI总结 提出SkillAxe框架,通过无监督评估引导LLM自我诊断和精炼技能,在SkillsBench上提升通过率28%,缩小与人类技能的差距47-67%。

详情
Comments
9 pages, under review
AI中文摘要

技能文档是指导大型语言模型(LLM)智能体的结构化自然语言指令,对现代智能体框架至关重要,但LLM难以编写实际可用的技能。在SkillsBench上,人类编写的技能将通过率提高了16.2个百分点,而LLM编写的技能没有带来可衡量的提升。我们引入了SkillAxe,一个完全无监督的框架,使LLM能够迭代地诊断和精炼自己的技能。SkillAxe将技能质量分解为四个可解释的维度(质量影响、触发精度、指令合规性与故障归因、解决方案路径覆盖),生成结构化的改进简报,无需真实标签、测试套件或环境奖励。在SkillsBench上,SkillAxe相对于未改进的LLM技能将通过率提高了28%,并缩小了与人类技能差距的47-67%。我们在SpreadsheetBench上验证了该方法作为持续改进引擎的效果,其中SkillAxe构建的技能库从过去的智能体轨迹中学习,仅使用22个技能就将通过率从16.0%提高到52.0%。

英文摘要

Skill documents, structured natural-language instructions that guide Large Language Model (LLM) agents, are critical to modern agent frameworks, yet LLMs struggle to write skills that actually work. On SkillsBench, human-authored skills improve pass rates by 16.2 percentage points, while LLM-authored skills provide no measurable gain. We introduce SkillAxe, a fully unsupervised framework that enables LLMs to iteratively diagnose and refine their own skills. SkillAxe decomposes skill quality into four interpretable dimensions (quality impact, trigger precision, instruction compliance with fault attribution, and solution-path coverage), producing structured improvement briefs that require no ground-truth labels, test suites, or environment rewards. On SkillsBench, SkillAxe improves pass rates by 28\% relative over unimproved LLM skills and closes 47--67\% of the gap to human-authored skills. We validate the approach as a continuous improvement engine in the wild on SpreadsheetBench, where a SkillAxe-built skill library learns from past agent trajectories and raises pass rate from 16.0\% to 52.0\% using only 22 skills.

2606.08102 2026-06-11 cs.RO cs.AI cs.MA 版本更新

Continual Quadruped Robots Coordination via Semantic Skill Discovery

通过语义技能发现实现持续四足机器人协调

Daoqing Wang, Yuchen Xiao, Weixuan Huang, Zhilong Zhang, Shenghua Wan, Meng Li, Lei Yuan, Yang Yu

AI总结 提出Conquer框架,通过语义技能库实现多四足机器人在持续学习任务中的协调,避免灾难性遗忘,最终平均成功率95.6%。

详情
Comments
22 pages, 8 figures, 11 tables. Project page: this https URL
AI中文摘要

多四足协调因其增强的负载能力、更广的接触覆盖范围以及对挑战性任务的适应性提升而受到越来越多的关注。现有的多四足操作方法通常专注于预定义或封闭的任务族,往往依赖多智能体强化学习(MARL)来训练特定任务的协调策略。然而,这类方法在开放式持续学习场景中难以应对,其中任务顺序到达,机器人期望在复用先前学到的技能的同时获取新协调技能,且不出现灾难性遗忘。为应对这一挑战,我们提出Conquer,一个语义技能库框架,将持续多四足协调形式化为检索-适应-更新过程。首先,为适应不同任务中的团队规模变化,我们设计了一个团队结构的Self-Allies-Goal(SAG)主干,通过显式建模每个机器人自身状态、队友上下文和任务目标,支持可变基数的机器人团队。对于每个新任务,Conquer从执行前信息构建任务级语义描述符,并从技能库中检索相关技能进行适应。成功执行后,Conquer通过提取轨迹级语义描述符并根据语义距离组织它们来更新技能库,从而实现持续技能积累和跨任务知识迁移。仿真实验表明,Conquer达到了95.6%的最终平均成功率,展示了强大的前向迁移能力和可忽略的灾难性遗忘。在宇树Go2团队上的实际部署进一步验证了Conquer用于实际多四足协调的可行性。仿真和真实机器人演示视频见:https://conquer-project.pages.dev/。

英文摘要

Multi-quadruped coordination has attracted increasing attention due to its enhanced payload capacity, broader contact coverage, and improved adaptability to challenging tasks. Existing methods for multi-quadruped manipulation typically focus on predefined or closed task families, often relying on multi-agent reinforcement learning (MARL) to train task-specific coordination policies. However, such methods struggle in open-ended continual learning settings, where tasks arrive sequentially and robots are expected to acquire new coordination skills while reusing previously learned ones without catastrophic forgetting. To address this challenge, we propose Conquer, a semantic skill-library framework that formulates continual multi-quadruped coordination as a retrieve-adapt-update process. First, to accommodate varying team sizes across tasks, we design a team-structured Self-Allies-Goal (SAG) backbone that supports variable-cardinality robot teams by explicitly modeling each robot's own state, teammate context, and task goal. For each incoming task, Conquer constructs a task-level semantic descriptor from pre-execution information and retrieves a relevant skill from the library for adaptation. After successful execution, Conquer updates the skill library by extracting trajectory-level semantic descriptors and organizing them according to semantic distance, thereby enabling continual skill accumulation and cross-task knowledge transfer. Simulation experiments show that Conquer achieves a final average success rate of 95.6%, demonstrating strong forward transfer and negligible catastrophic forgetting. Real-world rollouts on Unitree Go2 teams further validate the deployment feasibility of Conquer for practical multi-quadruped coordination. Simulation and real-robot demonstration videos are available at: this https URL.

2605.12655 2026-06-11 cs.AI cs.MA 版本更新

Robust Instruction Compliance in Cooperative Multi-Agent Reinforcement Learning

鲁棒的指令遵从:合作多智能体强化学习

Wo Wei Lin, Ethan Rathbun, Enrico Marchesini, Xiang Zhi Tan

AI总结 针对外部指令中断行为并冲突长期目标的问题,提出宏动作值修正方法(MAVIC),通过修正指令边界的Bellman备份实现一致值估计,在复杂合作环境中保持高指令遵从和基础任务性能。

详情
AI中文摘要

现实场景中的多智能体强化学习(MARL)可能需要适应外部自然语言指令,这些指令会中断正在进行的行为并与长期目标冲突。然而,基于指令的条件奖励引入了一种基本失败模式,因为Bellman更新耦合了跨指令上下文的值估计,导致当指令中断宏动作时值不一致。我们提出了用于指令遵从的宏动作值修正(MAVIC),该方法通过修正传入指令目标并恢复当前目标下的延续值,来纠正指令边界处的Bellman备份。与奖励塑形不同,MAVIC修改了自举目标本身,从而在统一策略下实现随机指令切换时的一致值估计。我们提供了理论分析和演员-评论家实现,并表明MAVIC在日益复杂的合作多智能体环境中实现了高指令遵从,同时保持了基础任务性能。

英文摘要

Multi-agent reinforcement learning (MARL) in real-world use cases may need to adapt to external natural language instructions that interrupt ongoing behavior and conflict with long-horizon objectives. However, conditioning rewards on instructions introduces a fundamental failure mode as Bellman updates couple value estimates across instruction contexts, leading to inconsistent values when instructions interrupt macro-actions. We propose Macro-Action Value Correction for Instruction Compliance (MAVIC), which corrects Bellman backups at instruction boundaries by correcting the incoming instruction objective and restoring the continuation value under the current objective. Unlike reward shaping, MAVIC modifies the bootstrapping target itself, enabling consistent value estimation under stochastic instruction switching within a unified policy. We provide theoretical analysis and an actor-critic implementation, and show that MAVIC achieves high instruction compliance while preserving base task performance in increasingly complex cooperative multi-agent environments.

2510.18289 2026-06-11 cs.CL cs.CY cs.MA 版本更新

Food4All: An Agentic Framework and Benchmark for Food Resource Navigation with Adaptive User Understanding

Food4All: 一种具有自适应用户理解能力的食物资源导航智能体框架与基准

Yiyang Li, Weixiang Sun, Tianyi Ma, Kaiwen Shi, Zheyuan Zhang, Yanfang Ye

AI总结 提出Food4All框架,结合食物搜索工具与300个多轮评估任务,在686个印第安纳食物资源上评估六种大语言模型,诊断其在约束条件处理和非理想用户交互中的不足。

详情
Comments
We have further refined the benchmark construction and experimental presentation to improve clarity and consistency. The revised version includes updated task design, food-resource data, and evaluation details to better align the benchmark with the intended food resource referral setting. These changes provide a more precise presentation of the experimental findings
AI中文摘要

食物援助推荐需要对话智能体将未明确指定且常含噪声的求助对话转化为本地有效的资源推荐。我们提出Food4All,一个基于686个结构化印第安纳食物资源的智能体食物资源推荐框架与基准。Food4All将食物特定搜索工具与300个多轮评估任务相结合,涵盖单一食物需求、具有访问或文件约束的复合案例,以及五种非理想用户交互特征:不合理要求、冗长回答、不耐烦、不完整答案和不一致信息。我们在需求理解、资源检索、最终推荐正确性和交互效率上评估了六种大语言模型。尽管最强模型达到了96.33%的推荐准确率,但我们的诊断揭示了在时间安排、资格、接收和文件约束方面的持续失败,以及在最终推荐中未能保留有效检索到的资源。特征级分析进一步表明,不同的非理想行为对推荐流程的不同部分造成压力。Food4All为在现实用户交互挑战下研究约束敏感的食物援助推荐中的工具调用智能体提供了一个受控测试平台。

英文摘要

Food assistance referral requires conversational agents to translate underspecified, often noisy help-seeking dialogues into locally valid resource recommendations. We present Food4All, an agentic food-resource referral framework and benchmark grounded in 686 structured Indiana food resources. Food4All couples a food-specific search tool with 300 multi-turn evaluation tasks spanning single food needs, composite cases with access or document constraints, and five non-ideal user interaction traits: unreasonable demands, rambling responses, impatience, incomplete answers, and inconsistent information. We evaluate six Large Language Models (LLMs) on requirement grounding, resource retrieval, final referral correctness, and interaction efficiency. Although the strongest model achieves 96.33% referral accuracy, our diagnostics reveal persistent failures in grounding schedule, eligibility, intake, and document constraints, as well as failures to preserve valid retrieved resources in the final recommendation. Trait-level analysis further shows that different non-ideal behaviors stress different parts of the referral pipeline. Food4All provides a controlled testbed for studying tool-calling agents in constraint-sensitive food assistance referral under realistic user interaction challenges.

2604.20348 2026-06-11 cs.RO cs.AI cs.MA 版本更新

Bimanual Robot Manipulation via Multi-Agent In-Context Learning

通过多智能体上下文学习的双臂机器人操作

Alessio Palma, Indro Spinelli, Vignesh Prasad, Luca Scofano, Yufeng Jin, Georgia Chalvatzaki, Fabio Galasso

AI总结 提出BiCICLe框架,将双臂操作建模为多智能体主从问题,通过解耦动作空间实现标准LLM的少样本学习,在TWIN基准上平均成功率70.5%,超越无训练基线。

详情
AI中文摘要

语言模型(LLMs)已成为具身控制的强大推理引擎。特别是,上下文学习(ICL)使得现成的纯文本LLM能够预测机器人动作,无需任何任务特定训练,同时保持其泛化能力。将ICL应用于双臂操作仍然具有挑战性,因为高维联合动作空间和紧密的臂间协调约束迅速压垮标准上下文窗口。为了解决这个问题,我们引入了BiCICLe(双臂协调上下文学习),这是第一个使标准LLM无需微调即可执行少样本双臂操作的框架。BiCICLe将双臂控制建模为多智能体主从问题,将动作空间解耦为顺序的、条件化的单臂预测。在TWIN基准的13个任务上评估,BiCICLe实现了70.5%的平均成功率,比最佳无训练基线高出6.1个百分点,并超过了大多数监督方法。我们还展示了在3个任务上无需特定硬件重新训练的优越现实世界性能。

英文摘要

Language Models (LLMs) have emerged as powerful reasoning engines for embodied control. In particular, In-Context Learning (ICL) enables off-the-shelf, text-only LLMs to predict robot actions without any task-specific training while preserving their generalization capabilities. Applying ICL to bimanual manipulation remains challenging as the high-dimensional joint action space and tight inter-arm coordination constraints rapidly overwhelm standard context windows. To address this, we introduce BiCICLe (Bimanual Coordinated In-Context Learning), the first framework that enables standard LLMs to perform few-shot bimanual manipulation without fine-tuning. BiCICLe frames bimanual control as a multi-agent leader-follower problem, decoupling the action space into sequential, conditioned single-arm predictions. Evaluated on 13 tasks from the TWIN benchmark, BiCICLe achieves 70.5% average success rate, outperforming the best training-free baseline by 6.1 percentage points and surpassing most supervised methods. We also demonstrate superior real-world performance on 3 tasks without hardware-specific retraining.

2603.14867 2026-06-11 cs.LG cs.AI cs.GT cs.MA 版本更新

Sample-Efficient Hypergradient Estimation for Decentralized Bi-Level Reinforcement Learning

用于去中心化双层强化学习的样本高效超梯度估计

Mikoto Kudo, Takumi Tanabe, Akifumi Wachi, Youhei Akimoto

AI总结 针对去中心化双层强化学习中领导者无法干预跟随者优化过程的问题,提出基于玻尔兹曼协方差技巧的超梯度估计方法,实现高维决策空间下的样本高效优化,并首次应用于双人马尔可夫博弈。

详情
Comments
29 pages. Extended version of the paper accepted to ICAPS 2026
AI中文摘要

许多战略决策问题,例如仓库机器人的环境设计,可以自然地表述为双层强化学习,其中领导者代理优化其目标,而跟随者解决一个以领导者决策为条件的马尔可夫决策过程。在许多情况下,当领导者无法干预跟随者的优化过程时,会出现一个基本挑战;它只能观察优化结果。我们通过推导领导者目标的超梯度(即考虑跟随者最优策略变化的领导者策略梯度)来解决这种去中心化设置。与先前基于超梯度的方法不同,这些方法需要大量数据来重复访问状态,或者依赖于梯度估计器,其复杂度可能随着领导者决策空间的高维性而显著增加,我们利用玻尔兹曼协方差技巧推导出一种替代的超梯度公式。这使得仅从交互样本中就能进行高效的超梯度估计,即使领导者的决策空间是高维的。此外,据我们所知,这是第一种能够在去中心化设置中实现基于超梯度的优化的双人马尔可夫博弈方法。实验突出了超梯度更新的影响,并展示了我们的方法在离散和连续状态任务中的有效性。

英文摘要

Many strategic decision-making problems, such as environment design for warehouse robots, can be naturally formulated as bi-level reinforcement learning (RL), where a leader agent optimizes its objective while a follower solves a Markov decision process (MDP) conditioned on the leader's decisions. In many situations, a fundamental challenge arises when the leader cannot intervene in the follower's optimization process; it can only observe the optimization outcome. We address this decentralized setting by deriving the hypergradient of the leader's objective, i.e., the gradient of the leader's strategy that accounts for changes in the follower's optimal policy. Unlike prior hypergradient-based methods that require extensive data for repeated state visits or rely on gradient estimators whose complexity can increase substantially with the high-dimensional leader's decision space, we leverage the Boltzmann covariance trick to derive an alternative hypergradient formulation. This enables efficient hypergradient estimation solely from interaction samples, even when the leader's decision space is high-dimensional. Additionally, to our knowledge, this is the first method that enables hypergradient-based optimization for 2-player Markov games in decentralized settings. Experiments highlight the impact of hypergradient updates and demonstrate our method's effectiveness in both discrete and continuous state tasks.

2511.13452 2026-06-11 physics.soc-ph cs.MA 版本更新

Collective decision-making with higher-order interactions on $d$-uniform hypergraphs

在$d$-一致超图上的高阶交互集体决策

Thierry Njougouo, Timoteo Carletti, Elio Tuci

AI总结 研究在$d$-一致超图上基于群体交互的舆论动力学模型,通过平均场分岔分析识别两个临界阈值,揭示交互组大小和品质比决定共识稳定性,且大组规模可能导致采纳劣质选项。

详情
AI中文摘要

理解群体交互如何影响舆论动态是研究集体行为的基础。在这项工作中,我们提出并研究了在$d$-一致超图上的舆论动力学模型,其中个体通过基于群体的(高阶)结构而非简单的成对连接进行交互。两种观点$A$和$B$各有一个品质$Q_A$和$Q_B$,智能体根据一个通用机制更新其观点,该机制考虑了支持任一观点的智能体的加权比例以及池化误差$\alpha$,后者是交互过程中信息丢失的代理。通过对平均场模型的分岔分析,我们确定了两个临界阈值$\alpha_{\text{crit}}^{(1)}$和$\alpha_{\text{crit}}^{(2)}$,它们界定了共识状态的稳定性区域。这些分析预测通过在随机和无标度超图上的大量基于智能体的模拟得到了验证。此外,分析框架表明,分岔结构和临界阈值独立于高阶网络的底层拓扑,仅取决于参数$d$(即交互组的大小)和品质比。最后,我们揭示了一个非平凡效应:大的交互组大小可能驱使系统采纳最差的选项。

英文摘要

Understanding how group interactions influence opinion dynamics is fundamental to the study of collective behavior. In this work, we propose and study a model of opinion dynamics on $d$-uniform hypergraphs, where individuals interact through group-based (higher-order) structures rather than simple pairwise connections. Each one of the two opinions $A$ and $B$ is characterized by a quality, $Q_A$ and $Q_B$, and agents update their opinions according to a general mechanism that takes into account the weighted fraction of agents supporting either opinion and the pooling error, $\alpha$, a proxy for the information lost during the interaction. Through bifurcation analysis of the mean-field model, we identify two critical thresholds, $\alpha_{\text{crit}}^{(1)}$ and $\alpha_{\text{crit}}^{(2)}$, which delimit stability regimes for the consensus states. These analytical predictions are validated through extensive agent-based simulations on both random and scale-free hypergraphs. Moreover, the analytical framework demonstrates that the bifurcation structure and critical thresholds are independent of the underlying topology of the higher-order network, depending solely on the parameters $d$, i.e., the size of the interaction groups, and the quality ratio. Finally, we bring to the fore a nontrivial effect: the large sizes of the interaction groups, could drive the system toward the adoption of the worst option.

2509.14860 2026-06-11 cs.CV cs.AI cs.CL cs.MA 版本更新

MARIC: Multi-Agent Reasoning for Image Classification

MARIC:用于图像分类的多智能体推理

Wonduk Seo, Minhyeong Yu, Hyunjin An, Seunghyun Lee

AI总结 提出多智能体框架MARIC,通过分解图像分类为协作推理过程,利用大纲智能体、方面智能体和推理智能体进行多视角分析与综合,在四个基准数据集上显著优于基线方法。

详情
Comments
11 pages, preprint
AI中文摘要

图像分类传统上依赖于参数密集型模型训练,需要大规模标注数据集和大量微调才能达到有竞争力的性能。虽然最近的视觉语言模型(VLM)缓解了其中一些限制,但它们仍然受限于对单次表示的依赖,往往无法捕捉视觉内容的互补方面。在本文中,我们介绍了基于多智能体的图像分类推理(MARIC),这是一个多智能体框架,将图像分类重新表述为协作推理过程。MARIC首先利用大纲智能体分析图像的全局主题并生成有针对性的提示。基于这些提示,三个方面智能体沿着不同的视觉维度提取细粒度描述。最后,推理智能体通过集成反思步骤综合这些互补输出,产生用于分类的统一表示。通过明确地将任务分解为多个视角并鼓励反思性综合,MARIC减轻了参数繁重训练和单一VLM推理的缺点。在4个不同的图像分类基准数据集上的实验表明,MARIC显著优于基线,突出了多智能体视觉推理在鲁棒且可解释的图像分类中的有效性。

英文摘要

Image classification has traditionally relied on parameter-intensive model training, requiring large-scale annotated datasets and extensive fine tuning to achieve competitive performance. While recent vision language models (VLMs) alleviate some of these constraints, they remain limited by their reliance on single pass representations, often failing to capture complementary aspects of visual content. In this paper, we introduce Multi Agent based Reasoning for Image Classification (MARIC), a multi agent framework that reformulates image classification as a collaborative reasoning process. MARIC first utilizes an Outliner Agent to analyze the global theme of the image and generate targeted prompts. Based on these prompts, three Aspect Agents extract fine grained descriptions along distinct visual dimensions. Finally, a Reasoning Agent synthesizes these complementary outputs through integrated reflection step, producing a unified representation for classification. By explicitly decomposing the task into multiple perspectives and encouraging reflective synthesis, MARIC mitigates the shortcomings of both parameter-heavy training and monolithic VLM reasoning. Experiments on 4 diverse image classification benchmark datasets demonstrate that MARIC significantly outperforms baselines, highlighting the effectiveness of multi-agent visual reasoning for robust and interpretable image classification.

2402.00972 2026-06-11 cs.LG cs.MA physics.comp-ph

Closure Discovery for Coarse-Grained Partial Differential Equations Using Grid-based Reinforcement Learning

Jan-Philipp von Bassewitz, Sebastian Kaltenbach, Petros Koumoutsakos

详情
Comments
Conference on Parsimony and Learning (CPAL)
英文摘要

Reliable predictions of critical phenomena, such as weather, wildfires and epidemics often rely on models described by Partial Differential Equations (PDEs). However, simulations that capture the full range of spatio-temporal scales described by such PDEs are often prohibitively expensive. Consequently, coarse-grained simulations are usually deployed that adopt various heuristics and empirical closure terms to account for the missing information. We propose a novel and systematic approach for identifying closures in under-resolved PDEs using grid-based Reinforcement Learning. This formulation incorporates inductive bias and exploits locality by deploying a central policy represented efficiently by a Fully Convolutional Network (FCN). We demonstrate the capabilities and limitations of our framework through numerical solutions of the advection equation and the Burgers' equation. Our results show accurate predictions for in- and out-of-distribution test cases as well as a significant speedup compared to resolving all scales.

2307.01472 2026-06-11 cs.AI cs.LG cs.MA 版本更新

Improving Generalization and Data Efficiency with Diffusion in Offline Multi-agent RL

通过扩散模型提升离线多智能体强化学习的泛化能力与数据效率

Zhuoran Li, Ling Pan, Jiatai Huang, Longbo Huang

AI总结 提出扩散离线多智能体模型(DOM2),利用扩散模型增强策略表达力和多样性,结合轨迹数据重加权,在离线MARL中显著提升性能、泛化能力和数据效率。

详情
AI中文摘要

我们提出了一种新颖的扩散离线多智能体模型(DOM2),用于离线多智能体强化学习(MARL)。与主要依赖策略设计中保守性的现有算法不同,DOM2基于扩散模型增强了策略的表达力和多样性。具体来说,我们将扩散模型融入策略网络,并在训练中提出了一种基于轨迹的数据重加权方案。这些关键要素显著提高了算法对环境变化的鲁棒性,并在性能、泛化和数据效率方面取得了显著提升。我们的大量实验结果表明,DOM2在所有多智能体粒子和多智能体MuJoCo环境中均优于现有最先进方法,并且由于其高表达力和多样性,在迁移环境中(在评估的30个设置中有28个)泛化能力显著更强。此外,DOM2具有超高的数据效率,与现有算法相比,实现相同性能所需数据不超过5%(数据效率提升20倍)。

英文摘要

We present a novel Diffusion Offline Multi-agent Model (DOM2) for offline Multi-Agent Reinforcement Learning (MARL). Different from existing algorithms that rely mainly on conservatism in policy design, DOM2 enhances policy expressiveness and diversity based on diffusion model. Specifically, we incorporate a diffusion model into the policy network and propose a trajectory-based data-reweighting scheme in training. These key ingredients significantly improve algorithm robustness against environment changes and achieve significant improvements in performance, generalization and data-efficiency. Our extensive experimental results demonstrate that DOM2 outperforms existing state-of-the-art methods in all multi-agent particle and multi-agent MuJoCo environments, and generalizes significantly better to shifted environments {(in $28$ out of $30$ settings evaluated)} thanks to its high expressiveness and diversity. Moreover, DOM2 is ultra data efficient and requires no more than $5\%$ data for achieving the same performance compared to existing algorithms (a $20\times$ improvement in data efficiency).

2201.09691 2026-06-11 cs.MA econ.TH math.CO 版本更新

Multidimensional Manhattan Preferences

多维曼哈顿偏好

Jiehua Chen, Martin Nöllenburg, Sofia Simola, Anaïs Villedieu, Markus Wallinger

AI总结 研究d-曼哈顿偏好谱系的存在性与极小反例,证明当d≥min(n,m-1)时所有偏好谱系均为d-曼哈顿,并刻画2-曼哈顿的极小禁止子结构。

详情
AI中文摘要

一个偏好谱系(即选民对一组备选方案的线性偏好序的集合)具有$m$个备选方案和$n$个选民,称为$d$-曼哈顿(相应地,$d$-欧几里得),如果备选方案和选民都可以放置在$d$维空间中,使得在每对备选方案之间,每个选民偏好与其曼哈顿(相应地,欧几里得)距离较短的方案。我们研究$d$-曼哈顿偏好谱系如何依赖于$m$和$n$的值。首先,我们提供显式构造,证明当$d \ge \min(n, m - 1)$时,每个具有$m$个备选方案和$n$个选民的偏好谱系都是$d$-曼哈顿的。我们进一步将这一积极结果推广到其他$p$-范数,其中$p \in R_{\ge 1} \cup \{\infty\}$。其次,对于$d = 2$,我们发展出禁止子结构——小规模选民集合中的偏好模式,这些模式约束任何2-曼哈顿嵌入——并利用它们证明最小的非2-曼哈顿偏好谱系要么有3个选民和6个备选方案,要么有4个选民和5个备选方案,要么有5个选民和4个备选方案。这比$d$-欧几里得偏好的情况更复杂(参见(Bogomolnaia and Laslier, 2007)和(Bulteau and Chen, 2022))。我们还证明$d$-曼哈顿偏好蕴含$(2d-1)$维单峰性,而2-曼哈顿性与单峰性和单交叉性不可比较。

英文摘要

A preference profile (i.e., a collection of linear preference orders of the voters over a set of alternatives) with $m$ alternatives and $n$ voters is $d$-Manhattan (resp. $d$-Euclidean) if both the alternatives and the voters can be placed into a $d$-dimensional space such that between each pair of alternatives, every voter prefers the one which has a shorter Manhattan (resp. Euclidean) distance to the voter. We study how $d$-Manhattan preference profiles depend on the values $m$ and $n$. First, we provide explicit constructions to show that each preference profile with $m$ alternatives and $n$ voters is $d$-Manhattan whenever $d \ge \min(n, m - 1)$. We further extend this positive result for other $p$-norms with $p \in R_{\ge 1} \cup \{\infty\}$. Second, for $d = 2$, we develop forbidden substructures-preference patterns among small sets of voters that constrain any 2-Manhattan embedding -- and use them to show that the smallest non-2-Manhattan preference profile has either 3 voters and 6 alternatives, or 4 voters and 5 alternatives, or 5 voters and 4 alternatives. This is more complex than the case with $d$-Euclidean preferences (see (Bogomolnaia and Laslier, 2007) and (Bulteau and Chen, 2022)). We also show that $d$-Manhattan preferences imply $(2d-1)$-dimensional single-peakedness, while 2-Manhattanness is incomparable with single-peakedness and single-crossingness.