Mod-Guide: An LLM-based Content Moderation Feedback System to Address Insensitive Speech toward Indigenous Ethnic and Religious Minority Communities
Mod-Guide:一种基于LLM的内容审核反馈系统,用于解决针对原住民及少数族裔宗教群体的不敏感言论
Dipto Das, Achhiya Sultana, Ankit Singh Chauhan, Saadia Binte Alam, Mohammad Shidujaman, Shion Guha, Sunandan Chakraborty, Syed Ishtiaque Ahmed
AI总结 本文研究LLM审核系统对孟加拉国印度教和查克玛社区不敏感言论的认知局限,通过共同构建文化语料库和检索增强生成(RAG)方法开发Mod-Guide工具,提升模型对少数群体观点的敏感性。
详情
语言既是边缘化的机制,也是抵抗的机制,尤其是对于在网络上面对不敏感和有害言论的少数群体。随着内容审核越来越依赖大型语言模型(LLMs),人们开始担忧这些系统能否识别文化不敏感言论——即通过隐含的抹除、歪曲或规范性框架(而非公开敌意)忽视或边缘化历史上代表性不足社区的文化和宗教观点的言论。本文聚焦孟加拉国的印度教和查克玛社区——该国最大的宗教少数群体和原住民少数民族,研究了基于LLM的审核系统的认知局限,并探索融入少数群体视角的方法。我们与社区成员共同创建了一个文化敏感言论语料库,并使用检索增强生成(RAG)将他们的叙事整合到审核流程中。我们的工具Mod-Guide通过利用源自生活经验的上下文线索,提升了LLM对少数群体观点的敏感性。通过涉及少数群体和多数群体参与者的混合方法评估,我们证明RAG增强的审核响应在上下文上更准确,且不同族群对其感知存在差异。这项工作通过在前台化内容审核系统设计中的修复正义和诠释学包容,推进了人机交互、AI伦理和社会计算领域的研究。
Language operates as a mechanism of both marginalization and resistance, especially for minority communities navigating insensitive and harmful speech online. As content moderation increasingly depends on large language models (LLMs), concerns arise about whether these systems can recognize culturally insensitive speech-language that disregards or marginalizes the cultural and religious perspectives of historically underrepresented communities, often through implicit erasure, misrepresentation, or normative framing, rather than overt hostility. Focusing on Bangladesh's Hindu and Chakma communities -- the country's largest religious and Indigenous ethnic minorities, respectively -- this paper investigates the epistemic limits of LLM-based moderation systems and explores methods for incorporating minority perspectives. We co-created a culturally grounded corpus of insensitive speech with community members and integrated their narratives into moderation pipelines using retrieval augmented generation (RAG). Our tool, Mod-Guide, improves LLM sensitivity to minority viewpoints by leveraging contextual cues derived from lived experience. Through mixed-method evaluations involving both minority and majority participants, we demonstrate that RAG-enhanced moderation responses are more contextually accurate and perceived differently across ethnic lines. This work advances research in human-computer interaction, AI ethics, and social computing by foregrounding restorative justice and hermeneutical inclusion in the design of content moderation systems.