Should LLM Agents Decide in Social Simulations? Comparing Finite-State and LLM-Based Decision Policies
LLM智能体应在社会模拟中做决策吗?比较有限状态与基于LLM的决策策略
Alejandro Buitrago López, Javier Pastor-Galindo, José A. Ruipérez-Valiente
AI总结 研究评估LLM作为在线社交网络模拟中动作选择器时,是否保持可解释的参考策略,发现LLM在某些配置下可近似但不可靠地保持策略,且速度远慢于马尔可夫链采样。
详情
大型语言模型(LLMs)越来越多地被用作社会模拟中的决策组件。这引入了一种方法论风险:模拟可能偏离研究者定义的显式行为策略。在在线社交网络(OSN)模拟中,动作选择塑造系统动态、交互模式和模型可解释性。本文评估了LLM动作选择器在OSN模拟中是否保持可解释的参考策略。参考策略是一个实现为一阶马尔可夫模型的有限状态机,其转移概率取决于用户类型。评估使用包含1000个智能体和10000个动作决策的合成网络。测试了三种开放权重LLM:LLaMA 3.1、GPT-OSS和Mistral 24B。每个模型在三种提示策略下评估:基础、引导和概率。使用带有拉普拉斯平滑的詹森-香农散度衡量对齐度,并报告执行时间。结果表明,LLM在某些配置下可以近似参考策略,但不能可靠地保持它。对齐度因模型和提示而异,额外的引导可能引入系统性动作偏差。即使是最佳对齐的LLM配置也比直接马尔可夫链采样慢几百倍。这些发现表明,基于LLM的动作选择不能直接替代显式决策策略:它可能改变预期行为,同时增加计算成本。
Large language models (LLMs) are increasingly used as decision-making components in social simulations. This introduces a methodological risk: the simulation may deviate from the explicit behavioral policy defined by the researcher. In online social network (OSN) simulations, action choices shape system dynamics, interaction patterns, and model interpretability. This paper evaluates whether LLM action selectors preserve an interpretable reference policy in an OSN simulation. The reference is a finite state machine implemented as a first-order Markov model, with transition probabilities depending on the user type. The evaluation uses a synthetic network with 1,000 agents and 10,000 action decisions. Three open-weight LLMs are tested: LLaMA 3.1, GPT-OSS, and Mistral 24B. Each model is evaluated under three prompting strategies: base, guided, and probabilistic. Alignment is measured using Jensen-Shannon Divergence with Laplace smoothing, and execution time is reported. Results show that LLMs can approximate the reference policy in some configurations, but do not preserve it reliably. Alignment varies across models and prompts, and additional guidance can introduce systematic action biases. Even the best-aligned LLM configurations are several hundred times slower than direct Markov chain sampling. These findings indicate that LLM-based action selection is not a direct replacement for explicit decision policies: it can alter the intended behavior while increasing computational cost.