Committed SAE-Feature Traces for Audited-Session Substitution Detection in Hosted LLMs
托管LLM中审计会话替换检测的承诺SAE特征轨迹
AI总结 提出一种承诺-开放协议,通过Merkle树提交稀疏自编码器特征轨迹,以检测托管LLM提供商在服务中静默替换模型的行为。
Comments We identified inaccuracies in the security analysis: the closed-form intrinsic-dimension lower bound on the feature-forgery attacker (Proposition 4.2, Section 4, Appendix V) and the cross-backend noise calibration for the joint z-score threshold (Section 5.1, Table 2). These affect the claimed attack-resistance guarantees. We are withdrawing the paper to correct them before resubmission