SAGE: Segment-Aware Gloss-Free Encoding for Token-Efficient Sign Language Translation
SAGE: 面向令牌高效手语翻译的分段感知无词汇编码
AI总结 提出分段感知视觉标记化框架,通过手语分段将连续视频转换为离散视觉令牌,结合令牌级对比对齐和双层监督,在减少序列长度50%的同时,在PHOENIX14T基准上超越现有方法。
Comments Accepted in International Conference on Computer Vision (ICCV) Workshops. Code released at https://github.com/JianHe0628/SAGE