SafeLens: Deliberate and Efficient Video Guardrails with Fast-and-Slow Screening
SafeLens: 一种高效且可靠的视频护栏系统,采用快速和缓慢筛查
Shahriar Kabir Nahin, Hadi Askari, Muhao Chen, Anshuman Chhabra
AI总结 本研究提出SafeLens视频护栏框架,通过快速和缓慢的推理架构实现高效的视频内容审核,同时构建高质量数据集并采用结构化Chain-of-Thought追踪来解决训练时间扩展的限制,从而在实际和AI生成视频基准测试中取得最佳性能,同时显著降低推理成本。
详情
在线视频平台和AI生成内容的快速增长使得可靠的视频护栏成为安全性和现实部署的关键挑战。尽管大多数视频可通过快速模式识别筛查,但一小部分需要对时间复杂的内容和细致的政策约束进行深入推理。现有方法通常依赖于在所有输入上统一应用大型视觉-语言模型,导致推理成本高且计算资源分配效率低。我们提出了SafeLens视频护栏框架,引入快速和缓慢的推理架构,以实现高效且准确的内容审核,根据输入的不同具有可变的计算成本。此外,我们通过应用影响引导过滤对SafeWatch数据集进行处理,仅保留原始数据的2.4%。为进一步解决训练时间扩展的限制,我们通过在过滤数据中添加结构化的Chain-of-Thought追踪来实现测试时间推理。在实际和AI生成视频基准测试中,SafeLens实现了最先进的性能,优于强大的开源视频护栏(如SafeWatch-8B、OmniGuard-7B)和闭源模型(如GPT-5.4、Gemini-3.1-pro),同时显著降低推理成本,证明了高效设计比仅扩大数据或模型大小更有效。
The rapid growth of online video platforms and AI-generated content has made reliable video guardrails a key challenge for safety and real-world deployment. While most videos can be screened through fast pattern recognition, a small subset requires deeper reasoning over temporally complex content and nuanced policy constraints. Existing approaches typically rely on large vision-language models applied uniformly across all inputs, resulting in high inference costs and inefficient allocation of computation. We propose SafeLens, a video guardrail framework that introduces a fast-and-slow inference architecture for efficient and accurate content moderation with variable computational cost across inputs. Additionally, we construct a high-quality dataset by applying influence-guided filtering to the SafeWatch Dataset, retaining only 2.4% of the original data. To further address limitations of training-time scaling, we enable test-time reasoning by augmenting the filtered data with structured Chain-of-Thought traces. Across real-world and AI-generated video benchmarks, SafeLens achieves state-of-the-art performance, outperforming strong open-source video guardrails (e.g., SafeWatch-8B, OmniGuard-7B) and closed-source models (e.g., GPT-5.4, Gemini-3.1-pro) while significantly reducing inference cost, demonstrating that efficient design serves to be more effective than scaling data or model size alone.