MultiAct: Text-to-Motion Generation from Composite Text via Tailored Attention Guidance
MultiAct: 通过定制注意力引导从复合文本生成动作
Nathan Sala, Ofir Abramovich, Ariel Shamir, Daniel Cohen-Or, Andreas Aristidou, Sigal Raab
AI总结 提出MultiAct,一种无需重新训练或修改架构的推理时框架,通过自适应增强未充分表示提示组件的交叉注意力分数,解决复合文本到动作生成中语义覆盖不全的问题。
详情
- Comments
- Accepted to SIGGRAPH 2026 conference. Project page: https://natsala13.github.io/multiact.github.io
近年来,文本到动作生成发展迅速,为动画和人机交互提供了富有表现力的界面。然而,当前模型在处理描述同时发生的多个动作的提示时仍然脆弱。模型常常优先考虑单个主导动作而忽略其余部分,导致动作不完整或模糊,而不是实现复合描述的所有组成部分。我们提出MultiAct,一种无需配对、推理时的组合文本到动作合成框架,可直接作用于预训练的动作生成器,无需重新训练或架构修改。我们的方法通过自适应增强与未充分表示提示组件相关的交叉注意力分数来对抗语义崩溃。我们注意到有效调制取决于提示特定的选择,例如要定位的令牌和层,并引入一个轻量级辅助决策方案,以确定最有效的注意力增强参数化。广泛的定量和定性评估表明,MultiAct在复合提示上持续优于现有基线,在保持动作真实感的同时实现了改进的语义覆盖。项目页面:https://natsala13.github.io/multiact.github.io。
Text-to-motion generation has progressed rapidly in recent years, offering an expressive interface for animation and human-computer interaction. However, current models remain brittle when handling prompts that describe multiple actions occurring at the same time. Rather than realizing all components of a composite description, models frequently prioritize a single dominant action and neglect the rest, leading to incomplete or ambiguous motion. We present MultiAct, an unpaired, inference-time framework for compositional text-to-motion synthesis that operates directly on pretrained motion generators without retraining or architectural modification. Our method counteracts semantic collapse by adaptively amplifying cross-attention scores associated with underrepresented prompt components. We note that effective modulation depends on prompt-specific choices, such as which tokens and layers to target, and introduce a lightweight auxiliary decision scheme that determines the most effective attention-strengthening parametrization. Extensive quantitative and qualitative evaluations demonstrate that MultiAct consistently outperforms existing baselines on composite prompts, achieving improved semantic coverage while preserving motion realism. Project page: https://natsala13.github.io/multiact.github.io.