arXivDaily arXiv每日学术速递 周一至周五更新

视觉与机器人

图像生成

图像生成、文生图、图像编辑、扩散模型和可控生成。

今日/当前日期收录 2 信号源:cs.CV, cs.GR, cs.MM
2508.03483 2026-06-18 cs.CV cs.AI 版本更新 90%

When Cars Have Stereotypes: Auditing Demographic Bias in Objects from Text-to-Image Models

当汽车有刻板印象:审计文本到图像模型中对象的群体偏见

Dasol Choi, Jihwan Lee, Minjae Lee, Minsuk Kahng

发表机构 * AIM Intelligence(AIM智能研究院) Yonsei University(延世大学)

专题命中 文生图 :审计文本到图像模型中的群体偏见,涉及图像生成。

AI总结 提出SODA框架,通过三个指标系统测量文本到图像模型在生成对象中的群体偏见,发现中性提示隐含偏向中年和白人,且人口统计线索导致高度偏斜的刻板输出。

详情
AI中文摘要

虽然先前关于文本到图像生成的研究主要集中在人类描绘中的偏见,但生成对象中的群体偏见仍然相对未被充分探索。我们引入了SODA(刻板对象诊断审计),这是一个新颖的框架,通过自动属性发现和三个标准化指标系统地测量这些偏见:基础与群体差异(BDS)、跨群体差异(CDS)和视觉属性集中度(VAC)。将SODA应用于五个最先进模型和八个对象类别(例如汽车)的8000张图像,我们发现“中性”提示产生的输出在视觉上最接近中年和白人,表明这些群体在模型默认设置中被隐含地过度代表。此外,人口统计线索触发了高度偏斜的刻板输出:26.6%的对象-模型-群体组合产生的结果中,所有20张生成图像共享完全相同的属性值(例如,为女性生成玫瑰金笔记本电脑)。最后,提示级别的去偏减少了群体间差异,但矛盾地压缩了群体内多样性,用一种刻板印象取代了另一种。SODA提供了一个实用的流程,使这些隐含关联变得可测量,作为迈向更负责任的人工智能发展的一步。

英文摘要

While prior research on text-to-image generation has predominantly focused on biases in human depictions, demographic bias in generated objects remains relatively underexplored. We introduce SODA (Stereotyped Object Diagnostic Audit), a novel framework for systematically measuring these biases through automated attribute discovery and three standardized metrics: Base vs. Demographic Divergence (BDS), Cross-Demographic Disparity (CDS), and Visual Attribute Concentration (VAC). Applying SODA to 8,000 images across five state-of-the-art models and eight object categories (e.g., cars), we find that "neutral" prompts produce outputs most visually similar to middle-aged and White people, suggesting these groups are implicitly over-represented in model defaults. Furthermore, demographic cues trigger highly skewed stereotypical outputs: 26.6% of object-model-demographic combinations produce results where all 20 generated images share the exact same attribute value (e.g., rose gold laptops for women). Finally, prompt-level debiasing reduces inter-group disparity but paradoxically collapses within-group diversity, replacing one stereotype with another. SODA offers a practical pipeline for making these implicit associations measurable, serving as a step toward more responsible AI development.

2605.14877 2026-06-18 cs.CV 版本更新 85%

HeatKV: Head-tuned KV-cache Compression for Visual Autoregressive Modeling

HeatKV:针对视觉自回归建模的头部调制KV缓存压缩

Jonathan Cederlund, Axel Berg, William Isaksson, Durmus Alp Emre Acar, Chuteng Zhou, Pontus Giselsson

发表机构 * Dept. of Automatic Control, Lund University(自动控制系,吕勒欧大学) Arm(Arm公司)

专题命中 文生图 :提出HeatKV压缩方法用于视觉自回归图像生成。

AI总结 本文提出HeatKV方法,通过根据每个头部对先前生成尺度的注意力进行调整,实现更高效的KV缓存压缩,提升内存利用率并保持图像生成质量。

Comments 18 pages total including appendix; 6 main-paper figures, 2 appendix figures; 4 tables

详情
AI中文摘要

视觉自回归(VAR)模型最近在保持低延迟的同时展示了出色的图像生成质量。然而,它们受到严重的KV缓存内存限制,通常需要每个生成图像数吉字节的内存。我们引入了HeatKV,一种新的压缩方法,该方法根据每个头部对先前生成尺度的注意力来调整缓存分配。使用一个小的离线校准集,注意力头部根据其在先前尺度上的注意力分数进行排序。基于此排序,我们构建了一个针对给定内存预算定制的静态剪枝计划。应用于Infinity-2B模型时,HeatKV在KV缓存内存分配的压缩比上比现有方法高2倍,同时保持相似或更好的图像保真度、提示对齐度和人类感知分数。我们的方法在VAR模型的KV缓存压缩中达到了新的最先进的水平,展示了细粒度、特定头部的缓存分配的有效性。

英文摘要

Visual Autoregressive (VAR) models have recently demonstrated impressive image generation quality while maintaining low latency. However, they suffer from severe KV-cache memory constraints, often requiring gigabytes of memory per generated image. We introduce HeatKV, a novel compression method that adapts cache allocation in each head based on its attention to previously generated scales. Using a small offline calibration set, the attention heads are ranked according to their attention scores over prior scales. Based on this ranking, we construct a static pruning schedule tailored to a given memory budget. Applied to the Infinity-2B model, HeatKV achieves $2 \times$ higher compression ratio in memory allocation for KV cache compared to existing methods, while maintaining similar or better image fidelity, prompt alignment and human perception score. Our method achieves a new state-of-the-art (SOTA) for VAR model KV-cache compression, showcasing the effectiveness of fine-grained, head-specific cache allocation. Code and calibration script available at https://github.com/arm-research/heatkv.