Linguistic Productivity in Large Language Models: Models Coerce, but do not Preempt
大型语言模型中的语言生产力:模型强制但不抢占
Claire Bonial, Claire Benet Post, Laura Michaelis, Harish Tayyar Madabushi
AI总结 通过测试大型语言模型是否受固化(高频使用)和抢占(未观察到结构)两种统计信号影响,发现模型能识别强制情况下的构式生产力,但无法利用负面证据避免过度泛化。
详情
基于使用的语法理论认为,语言的创造性生产力受到两种不同频率信号的增强和约束:固化(源于高频使用)和抢占(源于在期望出现特定语言结构的语境中从未观察到该结构)。大型语言模型也是基于使用的,因为语言结构是通过接触大量文本而习得的。在这里,我们测试固化和抢占这两种对立的统计力量是否也鼓励和约束了LLM中的语言生产力。我们跨模型架构证明,较大的模型在强制情况下能够识别并用非词再现构式生产力(固化),其中更广泛的构式语境强制了对词汇项的非典型解释。然而,我们也表明,即使最大的模型也不会将负面证据扩展到新语言,并且统计抢占不能使模型避免对语义上合适但从未在数据中观察到的模式进行过度泛化。
Usage-based theories of grammars posit that creative productivity of the structures of language is both bolstered and constrained by two distinct frequency signals: entrenchment, stemming from high frequency usage, and preemption, stemming from having never observed a particular linguistic structure in a context where one might expect that structure to appear. Large Language Models are also usage-based, in the sense that the structures of language are learned through exposure to vast amounts of text. Here, we test whether or not the opposing statistical forces of entrenchment and preemption also encourage and constrain linguistic productivity in LLMs. We demonstrate across model architectures that larger models recognize and can reproduce with nonce words constructional productivity (entrenchment) in cases of coercion, wherein the broader constructional context coerces an atypical interpretation of a lexical item. However, we also show that even the largest models do not extend negative evidence to novel language, and statistical preemption does not enable models to avoid overgeneralization of patterns that are semantically felicitous, but never observed in data.