arXivDaily arXiv每日学术速递 周一至周五更新
重置
2605.28647 2026-05-28 cs.AI cs.CY q-fin.RM

The Ethics of LLM Sandbox and Persona Dynamics

LLM沙盒与人格动态的伦理

Tim Gebbie, Stewart Gebbie

AI总结 本文论证LLM护栏和人格动态产生的现实差距(reality gap)构成不道德的“现实洗白”(reality laundering),并提出通过任务级因果需求规范而非响应级道德修正来解决。

详情
Comments
8 pages
AI中文摘要

众所周知,LLM护栏和训练的人格动态会产生现实差距:LLM被允许或塑造描述的世界与用户必须行动的世界之间的距离。这里我们论证,主动产生现实差距实际上是不道德的,因为它有意将认知风险转嫁给不知情的用户——这就是现实洗白。当大规模运作时,这可能会造成伤害。在高暴露建议情境中风险最为尖锐,用户寻求的是方向而非有边界、可外部检查的任务。护栏在声称防止直接伤害时看似在伦理上必要,但当它们压制真实感知并将令人不适的机制洗白为可接受的抽象时,往往变得可疑。巴塞尔式金融监管、B-BBEE式合规、法国兴业银行和伦敦鲸事件展示了正式安全系统如何变得可理解、可博弈和表演性,而真实风险却转移到了别处。同样的模式可能出现在LLM中作为道德合规:安全的语言,扭曲的现实。因此,我们区分拒绝伤害与拒绝现实;然后主张在任务层面进行自上而下的因果需求规范,而非在响应或沙盒层面进行自下而上的道德修正。人格动态之所以重要,是因为助手界面并非中立;它塑造了不确定性、冲突、权威和风险如何被呈现。结论是,所谓的“伦理AI”当用制度安慰替代与现实接触时,实质上变得不伦理。

英文摘要

It is well known that LLM guardrails and trained persona dynamics can produce a reality gap: the distance between the world a LLM is permitted or shaped to describe, and the world in which users must act. Here we argue that actively generating reality gaps is in fact unethical because it knowingly shifts epistemic risk back to the uninformed user -- this is reality laundering. This can potentially cause harm when operationalised at scale. The risk is sharpest in high-exposure advice contexts, where users seek orientation rather than a bounded, externally checkable task. Guardrails naively appear ethically necessary when they claim to prevent direct harm, but often become suspect when they suppress truthful perception and launder uncomfortable mechanisms into acceptable abstractions. Basel-style financial regulation, B-BBEE-style compliance, Societe Generale, and the London Whale show how formal safety systems can become legible, gameable, and performative while real exposure migrates elsewhere. The same pattern can appear in LLMs as moral compliance: safe language, distorted reality. We therefore distinguish refusing harm, from refusing reality; and then argue for top-down causal requirements specification at the task level rather than bottom-up moral correction at the response or sandbox level. Persona dynamics matter because the assistant interface is not neutral; it shapes how uncertainty, conflict, authority, and risk are staged. The conclusion is that so-called ``ethical AI'' becomes substantively unethical when it substitutes institutional reassurance for contact with reality.

2605.28359 2026-05-28 cs.AI q-fin.TR

From Knowing to Doing: A Memory-Controlled Benchmark for LLM Trading Agents on Stock Markets

从知道到做到:面向LLM股票市场交易智能体的记忆控制基准

Taojie Zhu, Wentao Zhao, Rui Sun, Beidi Luan, Jiacheng Lu, Sinuo Wang, Jing Li, Daxin Jiang, Yonghong He, Zuo Bai

AI总结 针对LLM交易智能体评估中的知识泄露和收益归因问题,提出KTD-Fin基准,通过数据掩码和Barra风格归因框架,分离市场记忆与投资决策,并揭示收益主要来自被动市场暴露而非选股能力。

详情
AI中文摘要

评估大语言模型(LLM)智能体能否在资本市场盈利,越来越被框架化为端到端交易:将智能体置于历史市场中,让其交易,并衡量投资组合收益。这种设置容易导致两种评估失败。首先,长时间的回测往往与前沿LLM的知识截止日期重叠,使得记忆的股票代码、日期、价格和市场叙事替代了投资推理。其次,原始收益是选股能力的一个嘈杂代理,因为正收益可能来自市场贝塔、风格暴露或有利的市场环境,而非真正的阿尔法。我们引入了KTD-Fin(知道-做到金融基准),一个端到端的股票市场交易基准,解决了这两个问题。KTD-Fin使用数据侧掩码协议,在提示和工具中一致地匿名化关键标识符和日历信息,将历史市场记忆与投资决策分离。它还整合了Barra风格的表现归因框架,将投资组合收益分解为市场、风格和选股阿尔法成分。在2024-2026年窗口内对中国沪深300指数评估的十个前沿LLM智能体中,掩码显著改变了智能体的推理过程,推动其转向匿名化的因子推理。归因分析进一步表明,在泄露控制评估下,LLM智能体的累积收益主要由被动的市场和风格暴露解释,而持续选股阿尔法的证据有限。这些发现表明,金融LLM基准不仅应评估智能体是否赚钱,还应评估收益来源是否反映了可转移的投资技能。我们发布KTD-Fin作为LLM交易智能体泄露控制和归因感知评估的可复现模板。

英文摘要

Evaluating whether large language model (LLM) agents can profit in capital markets is increasingly framed as end-to-end trading: place an agent in a historical market, let it trade, and measure portfolio returns. This setup is vulnerable to two evaluation failures. First, long backtests often overlap with the knowledge cutoffs of frontier LLMs, allowing memorized tickers, dates, prices, and market narratives to substitute for investment reasoning. Second, raw returns are a noisy proxy for stock-selection ability, since positive performance may come from market beta, style exposure, or favorable regimes rather than genuine alpha. We introduce KTD-Fin (Knowing-To-Doing Financial Benchmark), an end-to-end stock-market trading benchmark that addresses both issues. KTD-Fin uses a data-side masking protocol to anonymize key identifiers and calendar information consistently across prompts and tools, separating historical market memory from investment decision-making. It also incorporates a Barra-style performance attribution framework that decomposes portfolio returns into market, style, and stock-selection alpha components. Across ten frontier LLM agents evaluated on the Chinese CSI300 over a 2024--2026 window, masking substantially changes agent rationales, pushing them towards anonymized factor-based reasoning. Attribution analysis further shows that LLM agents' cumulative returns under leakage-controlled evaluation are largely explained by passive market and style exposure, with limited evidence of persistent stock-selection alpha. These findings suggest that financial LLM benchmarks should evaluate not only whether an agent makes money, but also whether the source of returns reflects transferable investment skill. We release KTD-Fin as a reproducible template for leakage-controlled and attribution-aware evaluation of LLM trading agents.

2605.23955 2026-05-28 cs.AI cs.DC cs.LG cs.SI q-fin.CP

From Accuracy to Auditability: A Survey of Determinism in Financial AI Systems

从准确性到可审计性:金融AI系统中的确定性综述

Ruizhe Zhou, Xiaoyang Liu, Gaoyuan Du, Yi Zheng, Shouxi Ren, Deepayan Chakrabarti, Dengdu Jiang

AI总结 本文从系统视角综述了金融AI中表格模型、图网络和基于LLM的智能体工作流三种模态的不可重现性问题,通过实验量化了确定性指标并提出了分层评估框架。

详情
AI中文摘要

在受监管的金融环境中部署机器学习——如信用风险、欺诈检测和反洗钱——暴露了算法可重现性的关键漏洞。虽然早期的金融机器学习解决了统计挑战(如回测过拟合),但深度神经网络和生成式AI引入了根植于硬件和架构的机械非确定性。本综述从系统视角审视了当前金融AI中三种主要模态的可重现性失败:表格模型(事后解释方差)、图网络(随机采样和时间异步)以及基于LLM的智能体工作流(批次依赖的差异和轨迹漂移)。我们通过公开金融数据集上的第一方实验补充了文献分析——量化了信用评分中的解释排名不稳定性、基于GNN的欺诈检测中的预测翻转率以及LLM实体提取中张量并行引起的输出差异。我们提出了一个分层评估框架,将模态特定指标(RBO、D_cos、TDI、PSD)与审计准备度联系起来,并实证验证了logit级和语义级确定性度量的互补性。

英文摘要

Deploying machine learning in regulated financial environments -- credit risk, fraud detection, and anti-money laundering -- exposes critical vulnerabilities in algorithmic reproducibility. While early financial ML addressed statistical challenges such as backtest overfitting, deep neural networks and Generative AI have introduced mechanical nondeterminism rooted in hardware and architecture. This survey provides a systems perspective on reproducibility failures across three modalities now dominant in financial AI: tabular models (post-hoc explanation variance), graph networks (stochastic sampling and temporal asynchrony), and LLM-based agentic workflows (batch-dependent divergence and trajectory drift). We supplement the literature analysis with first-party experiments on public financial datasets -- quantifying explanation rank instability in credit scoring, prediction flip rates in GNN-based fraud detection, and tensor-parallel-induced output divergence in LLM entity extraction. We propose a layered evaluation framework linking modality-specific metrics (RBO, D_cos, TDI, PSD) to audit readiness, and empirically validate the complementarity of logit-level and semantic-level determinism measures.

2605.27977 2026-05-28 q-fin.PM q-fin.CP q-fin.MF q-fin.ST

Deep Learning Forecasting of the U.S. Aggregate Bond Index

美国综合债券指数的深度学习预测

Ajay Kumar Verma, Jul Jon Ramirez General, Yvan Landry Ndzonde Fonkou

AI总结 研究利用深度学习方法预测美国综合债券指数,发现序列变换(平稳性与记忆性)比模型复杂度更重要,MLP在分数差分序列上表现最佳,而CNN-GAF模型效果不佳。

详情
AI中文摘要

本研究考察了2018年至2026年2月期间每日观测的美国综合债券指数的统计特性和深度学习方法可预测性。我们首先确定指数水平具有极强的持久性,符合单位根行为(Dickey和Fuller),而对数收益是协方差平稳的,具有弱线性依赖和明显的波动聚集性,这是ARCH类过程的特征(Engle; Bollerslev)。受平稳性与信息保留之间权衡的驱动,我们通过分数差分(Granger和Joyeux; Hosking)遵循López de Prado的程序构建了一个“平稳但最大持久”的表示,并使用两种神经范式评估短期预测:(i) 在滞后向量上训练的多层感知器(MLPs),结合滞后长度和超参数调整(Hornik等人; Rumelhart等人);以及(ii) 在Gramian角场(GAF)图像编码上训练的卷积神经网络(CNNs)(Wang和Oates)。实证上,MLPs在水平上匹配了强大的朴素持久性基准,在收益上趋于接近零的预测,并在分数差分序列上实现了最强的增量性能,其中存在适度依赖但单位根漂移减弱。相比之下,CNN-GAF模型在所有三种表示上均产生持续为负的样本外R²。总体而言,结果表明,对于广泛债券指数的短期预测,预测性能的主要决定因素是序列的变换——其平稳性和记忆程度——而不是架构复杂性。基于滞后的模型在持久性下仍具有竞争力,而基于GAF的CNN更适合基于模式的任务,而非持久性主导的下一步预测。

英文摘要

This study looks at the statistical properties and predictability using deep learning methods of the U.S. aggregate bond index in daily observations spanning 2018 to February 2026. We first establish that index levels are extremely persistent and consistent with unitroot behavior (Dickey and Fuller), while log returns are covariance-stationary with weak linear dependence and pronounced volatility clustering characteristic of ARCH-type processes (Engle; Bollerslev). Motivated by the trade-off between stationarity and information retention, we construct a "stationary but maximally persistent" representation via fractional differencing (Granger and Joyeux; Hosking) following the procedure of López de Prado, and evaluate shorthorizon forecast using two neural paradigms: (i) Multilayer Perceptrons (MLPs) trained on lagged vectors with joint lag-length and hyperparameter tuning (Hornik et al.; Rumelhart et al.); and (ii) Convolutional Neural Networks (CNNs) trained on Gramian Angular Field (GAF) image encodings (Wang and Oates). Empirically, MLPs match the strong naive persistence benchmark on levels, collapse toward near-zero forecasts on returns, and achieve the strongest incremental performance on the fractionally differenced series, where moderate dependence remains but unit-root drift is attenuated. In contrast, CNN-GAF models deliver consistently negative out-of-sample R 2 across all three representations. Overall, the results imply that, for short-horizon forecasting of broad bond indices, the primary determinant of predictive performance is the transformation of the series-its degree of stationarity and memory-rather than architectural complexity. Lag-based models remain competitive under persistence, while GAFbased CNNs are better suited to pattern-based tasks than to persistence-dominated next-step prediction.

2605.27945 2026-05-28 q-fin.PM q-fin.CP q-fin.MF q-fin.PR q-fin.ST

Stochastic Volatility, Jumps, and Rates: A Unified Framework for Option Pricing and Term-Structure Simulation

随机波动率、跳跃与利率:期权定价与期限结构模拟的统一框架

Nunik Srikandi Putri, Ajay Kumar Verma, Neo Paul Lesupi

AI总结 本研究整合Heston、Bates和CIR模型,提出一个统一框架用于定价短期和中期股票期权并评估利率风险,发现连续随机波动主导短期定价,而随机利率显著影响一年以上估值。

详情
AI中文摘要

本研究开发了一个综合随机建模框架,使用Heston (1993)、Bates (1996)和CIR (1985)模型定价短期和中期股票期权并评估利率风险。我们使用Lewis (2001)傅里叶反演和Carr-Madan (1999) FFT方法校准Heston模型,发现参数集几乎相同,这与近期研究(如Agazzotti等人,2025)报告的校准稳定性一致。将模型扩展到Bates表明,对于60天期限,跳跃强度收敛到有效为零的值,这与跳跃对短期微笑拟合贡献微小的实证发现相呼应。我们进一步将我们的校准方法与Yoo (2025)提出的联合波动率曲面和方差期限结构框架进行比较,确认标准Heston/Bates校准对于所考虑的期限仍然稳健。最后,我们将CIR短期利率模型校准到Euribor期限结构,生成与Jeon和Kim (2025)近期随机利率期权定价研究一致的正向且经济一致的前向利率情景。总体而言,我们的结果表明连续随机波动主导近期定价动态,而随机利率显著影响一年以上的估值。

英文摘要

This study develops an integrated stochastic modeling framework for pricing short and medium-maturity equity options and assessing interest-rate risk using the Heston (1993), Bates (1996), and CIR (1985) models. We calibrate the Heston model using both the Lewis (2001) Fourier inversion and the Carr-Madan (1999) FFT approach, finding near-identical parameter sets, which is consistent with the calibration stability reported in recent studies such as Agazzotti et al. (2025). Extending the model to Bates shows that jump intensities converge to values effectively equal to zero for 60-day maturities, echoing empirical findings that jumps contribute marginally to short-term smile fitting. We further compare our calibration approach with the joint volatility-surface and variance-term-structure framework proposed by Yoo (2025), confirming that standard Heston/Bates calibration remains robust for the maturities considered. Finally, we calibrate the CIR short-rate model to the Euribor term structure, generating positive and economically consistent forward-rate scenarios in line with recent stochastic-rate option-pricing research by Jeon and Kim (2025). Overall, our results show that continuous stochastic volatility dominates near-term pricing dynamics, while stochastic interest rates materially influence valuations beyond one year.

2605.27848 2026-05-28 q-fin.PM econ.EM q-fin.CP q-fin.MF q-fin.ST

Regime-Based Portfolio Allocation Using Hidden Markov Models and Reinforcement Learning

基于隐马尔可夫模型和强化学习的制度性投资组合分配

Ajay Kumar Verma, Nunik Srikandi Putri, Neo Paul Lesupi

AI总结 本研究结合隐马尔可夫模型(HMM)与强化学习(RL),提出一种制度感知的投资组合分配框架,在股票、长期国债和黄金之间动态配置资产,实现了优于被动基准的风险调整后收益。

详情
AI中文摘要

本研究开发了一个制度感知的投资组合分配框架,该框架将马尔可夫转换模型与强化学习(RL)相结合,以在股票(SPY)、长期国债(TLT)和黄金(GLD)之间进行动态配置。使用2004-2025年的每日ETF数据,我们首先通过离散马尔可夫链刻画市场行为,然后估计一个由贝叶斯信息准则(BIC)选择的三状态高斯隐马尔可夫模型(HMM)。估计出的制度——低波动、过渡和高波动——表现出强持续性和状态依赖的收益动态,这与近期关于非线性市场状态的研究发现一致(Ardia et al., 2024; Gupta & Pierdzioch, 2023)。状态条件分析显示,SPY在稳定制度中占主导地位,而TLT和GLD在压力时期提供保护,这激发了制度条件分配规则。 我们使用30%的样本外测试窗口和一天执行滞后来评估基于规则的轮动和RL驱动策略,以避免前瞻偏差。基于HMM的分配均优于被动SPY基准,而RL策略实现了最高的风险调整后表现,提供了最强的夏普比率和显著更低的最大回撤,同时通过离散的制度依赖动作保持完全可解释。敏感性分析证实了三状态设定相对于两状态替代方案的稳健性。总体而言,结果表明RL可以系统地增强基于HMM的制度检测,为战术资产分配提供了一个透明、自适应且基于经验的框架。HMM-RL组合系统提供了一种透明的、基于规则的战术分配方法,相对于标准基准策略提高了风险调整后表现。

英文摘要

This study develops a regime-aware portfolio allocation framework that integrates Markov switching models with Reinforcement Learning (RL) to dynamically allocate across equities (SPY), long-term Treasuries (TLT), and gold (GLD). Using daily ETF data from 2004-2025, we first characterize market behavior through a discrete Markov chain and then estimate a three-state Gaussian Hidden Markov Model (HMM) selected by the Bayesian Information Criterion (BIC). The estimated regimes-low-volatility, transitional, and high-volatility-exhibit strong persistence and state-dependent return dynamics consistent with recent findings on nonlinear market states (Ardia et al., 2024; Gupta & Pierdzioch, 2023). State-conditional analysis shows that SPY dominates in stable regimes, while TLT and GLD provide protection during stressed periods, motivating regime-conditioned allocation rules. We evaluate rule-based rotation and RL-driven strategies using a 30% out-of-sample test window with a one-day execution lag to avoid look-ahead bias. Both HMM-based allocations outperform a passive SPY benchmark, while the RL policy achieves the highest risk-adjusted performance, delivering the strongest Sharpe ratio and materially lower drawdowns, yet remains fully interpretable through discrete regime-dependent actions. Sensitivity analysis confirms the robustness of the three-state specification relative to two-state alternatives. Overall, the results demonstrate that RL can systematically enhance HMM-based regime detection, providing a transparent, adaptive, and empirically grounded framework for tactical asset allocation. The combined HMM-RL system provides a transparent, rules-based approach to tactical allocation that improves risk-adjusted performance relative to standard benchmark strategies.

2605.27684 2026-05-28 econ.GN q-fin.EC

Insider and stealth trading with dynamic legal risk

具有动态法律风险的内幕交易与隐形交易

Bixing Qiao, Weixuan Xia

AI总结 本文在连续时间Kyle型框架下研究内幕交易者如何在利用隐形交易策略的同时战略性应对动态法律风险,通过新影响中性测度变化分析均衡,揭示监管对交易策略的塑造作用及三种监管影响。

详情
Comments
43 pages, 3 figures
AI中文摘要

本文研究了内幕交易者如何在连续时间Kyle型框架内,利用隐形交易策略战略性应对持续存在的法律风险。法律执行与交易同时进行,这种动态可能被大量噪音交易者群体所掩盖。当监管强度直接响应内幕交易者的交易强度并触发随机起诉时间时,所产生的法律制裁既包括针对策略的刑事处罚,也包括基于利润的民事处罚。采用一种新的影响中性测度变化,均衡分析表明,即使在实现隐形后,内幕交易者仍会内化监管风险,而执法可以显著影响均衡交易策略。相应的极限均衡产生丰富的结果,对监管影响有三个关键见解:(i)在持续监管审查下,内幕交易者交易资产基本面价值与市场价格之间差异的时变函数,并且随着法律风险消退,交易可能在交易期限临近结束时无限加剧;(ii)仅提高罚金作为优势选择成本,在抵消监管努力下降方面无效;(iii)刑事处罚对于威慑激进的内幕交易仍然至关重要,因为它们对交易强度施加了仅靠民事处罚无法实现的关键时间约束。

英文摘要

The present paper investigates how insiders strategically navigate ongoing legal risk while leveraging stealth trading within a continuous-time Kyle-type framework. Legal enforcement operates concurrently with trading, which dynamic can be adversely obscured by a large surrounding population of noise traders. While surveillance intensity responds directly to the insider's trading intensity, triggering a random prosecution time, the resulting legal sanctions encompass both strategy-focused criminal penalties and profit-dependent civil penalties. Employing a new impact-neutral measure change, equilibrium analysis shows that even after achieving stealth, the insider internalizes regulatory exposure, and enforcement can significantly shape equilibrium trading strategies. The associated limiting equilibria yield a rich set of outcomes, with three key insights for regulatory impact: (i) under committed regulatory scrutiny, the insider trades a time-varying function of the discrepancy between the asset's fundamental value and its market price, and trading may intensify indefinitely near the end of the trading horizon as legal risk recedes; (ii) merely raising penalties as an advantageous selection cost proves ineffective in offsetting declines in regulatory diligence; (iii) criminal penalties remain essential for deterring aggressive insider trading, as they impose critical temporal constraints on trading intensity not achievable through civil penalties alone.

2605.27658 2026-05-28 q-fin.MF

Historical Developments in Probability Measures for Asset Pricing: From State Prices to Modern Pricing Kernels

资产定价中概率测度的历史发展:从状态价格到现代定价核

Zhang Chen, Chen Kay

AI总结 本文综述了资产定价中概率测度的历史发展,从早期数学金融和状态价格理论到现代数据驱动概率变换,强调定价理论通过构造、变换或选择概率测度将市场价格表示为期望。

详情
AI中文摘要

本综述总结了资产定价中概率测度的历史发展,从早期数学金融和状态价格理论到风险中性估值、鞅测度、远期测度、随机贴现因子、不完全市场测度选择、基准定价、稳健与非线性定价,以及现代数据驱动的概率变换。核心主题是资产定价不仅仅是估计物理概率的练习。相反,定价理论构造、变换或选择概率测度,使得市场价格可以表示为贴现、计价单位归一化、边际效用加权、熵惩罚、校准或信息条件化后的期望。本文强调了里程碑式的贡献,包括Bachelier的投机概率模型、Arrow-Debreu状态依存债权、Black-Scholes-Merton期权定价、Harrison-Kreps和Harrison-Pliska的鞅形式化、Delbaen和Schachermayer的基本定理、Breeden-Litzenberger隐含状态价格密度、计价单位变换方法、Hansen-Jagannathan随机贴现因子约束、Cochrane的SDF综合,以及最近关于学习定价核的实证和机器学习工作。基于文本、注意力和情感的概率变换被视为最近的信息调整预测扩展,它们补充而非取代鞅、计价单位、SDF和不完全市场框架。本文还收集了状态价格、随机贴现因子、Radon-Nikodym密度、Girsanov测度变换、风险中性估值、远期测度、隐含密度、一致风险度量、基准定价、学习SDF和信息调整预测的关键公式。

英文摘要

This review summarizes the historical development of probability measures in asset pricing, from early mathematical finance and state price theory to risk-neutral valuation, martingale measures, forward measures, stochastic discount factors, incomplete-market measure selection, benchmark pricing, robust and nonlinear pricing, and modern data-driven probability transformations. The central theme is that asset pricing is not merely an exercise in estimating physical probabilities. Instead, pricing theory constructs, transforms, or selects probability measures so that market prices can be represented as expectations after discounting, numeraire normalization, marginal utility weighting, entropy penalization, calibration, or information conditioning. The paper emphasizes landmark contributions including Bachelier's probabilistic model of speculation, Arrow-Debreu state-contingent claims, Black-Scholes-Merton option pricing, Harrison-Kreps and Harrison-Pliska's martingale formalization, Delbaen and Schachermayer's fundamental theorem, Breeden-Litzenberger implied state price densities, change of numeraire methods, Hansen-Jagannathan stochastic discount factor restrictions, Cochrane's SDF synthesis, and recent empirical and machine learning work on learned pricing kernels. Text-, attention-, and sentiment-based probability transformations are treated as recent information-adjusted forecasting extensions that complement, rather than replace, martingale, numeraire, SDF, and incomplete-market frameworks. The paper also collects key formulas for state prices, stochastic discount factors, Radon-Nikodym densities, Girsanov changes of measure, risk-neutral valuation, forward measures, implied densities, coherent risk measures, benchmark pricing, learned SDFs, and information-adjusted forecasting.

2605.17117 2026-05-28 q-fin.ST

Geometric Observables for Financial Regime Detection

金融制度检测的几何可观测量

Will Hammond

AI总结 本文从股指收益的谱嵌入中提取四个几何可观测量(Berry相位率、谱熵、约化态纯度、哈密顿量敏感性),并在2000-2024年17次历史危机中与46个经典和机器学习基线比较,发现Berry相位率在样本外中位Cohen's d=0.72,每年误报比随机森林少约67%。

详情
Comments
25 pages, 10 figures, 1 table. Code and data: https://github.com/willhammondhimself/qcml-geometric-sde
AI中文摘要

我们从股指收益的学习谱嵌入中提取四个几何可观测量——Berry相位率、谱熵、约化态纯度和哈密顿量敏感性,并在2000-2024年间的17次历史危机中,将它们作为制度转换检测器与46个经典和机器学习基线进行比较。在九个标记窗口上使用滚动前向嵌套超参数选择,Berry相位率实现了无偏的样本外中位Cohen's $d = 0.72$(95%百分位自助法CI $[0.34, 1.18]$,10,000次重采样),并且每年产生的误报比标签监督的随机森林少约67%(每年1.2次 vs. 3.6次)。约化态纯度达到了所有方法中最高的样本内可分离性($d = 0.83$),与吸收比率($d = 0.80$)紧密相关;几何通道和经典通道基本不相关(平均$|ρ| \approx 0.22$),表明它们捕捉了不同的风险信号。分数构建是无监督的;超参数选择是唯一的监督步骤。

英文摘要

We extract four geometric observables -- Berry Phase Rate, Spectral Entropy, Reduced State Purity, and Hamiltonian Sensitivity -- from a learned spectral embedding of equity-index returns and evaluate them as regime-shift detectors against 46 classical and machine-learning baselines on 17 historical crises spanning 2000-2024. Under walk-forward nested hyperparameter selection on nine labelled windows, the Berry Phase Rate achieves an unbiased out-of-sample median Cohen's $d = 0.72$ (95% percentile-bootstrap CI $[0.34, 1.18]$, 10,000 resamples) and produces approximately 67% fewer false alarms per year than a label-supervised Random Forest (1.2 vs. 3.6 per year). Reduced State Purity attains the highest in-sample separability of any method ($d = 0.83$), tied closely by the Absorption Ratio ($d = 0.80$); geometric and classical channels are largely uncorrelated (mean $|ρ| \approx 0.22$), suggesting they capture distinct risk signals. Score construction is unsupervised; hyperparameter selection is the only supervised step.

2605.21743 2026-05-28 cs.AI econ.GN q-fin.EC

Who Uses AI? Platform Selection and the Measurement of Occupational AI Exposure

谁在使用AI?平台选择与职业AI暴露的测量

Michelle Yin, Burhan Ogut

AI总结 本文通过分析AI平台对话日志,揭示平台用户构成导致职业AI暴露测量偏差,并提出劳动力加权部分识别方法校正估计。

详情
AI中文摘要

来自AI平台的对话日志越来越多地被用于衡量职业对人工智能的暴露程度,但在这些日志中观察到的用户并非劳动力群体。我们表明,从平台导出的暴露分数结合了任务级别的AI适用性与平台用户群的职业构成。保持实证设计不变,仅改变平台输入会使ChatGPT后的就业系数变化1.9倍,并且同一供应商内的消费者和企业渠道在符号上存在分歧。我们将由此产生的非经典测量误差形式化,将其分解为职业间和职业内的选择,并构建了劳动力加权的部分识别界限。根据劳工统计局就业份额进行重新加权会使估计值衰减42%至93%。该偏差捕捉了观察用户中的增强效应,比劳动力中的替代效应更直接。

英文摘要

Conversation logs from AI platforms are increasingly used to measure occupational exposure to artificial intelligence, but the users observed in these logs are not the workforce. We show that platform-derived exposure scores combine task-level AI applicability with the occupational composition of the platform's user base. Holding the empirical design fixed, changing only the platform input changes the post-ChatGPT employment coefficient by a factor of 1.9, and consumer and enterprise channels within the same vendor disagree in sign. We formalize the resulting non-classical measurement error, decompose it into between- and within-occupation selection, and construct workforce-reweighted partial-identification bounds. Reweighting to Bureau of Labor Statistics employment shares attenuates estimates by 42 to 93 percent. The bias captures augmentation among observed users more directly than substitution in the workforce.

2603.09303 2026-05-28 q-fin.PM

Investor risk profiles of large language models

大型语言模型的投资者风险画像

Hanyong Cho, Geumil Bae, Jang Ho Kim

AI总结 本研究通过标准化风险问卷评估GPT、Gemini和Llama三种大型语言模型的风险偏好,发现它们普遍为长期投资者但风险容忍度不同,且赋予特定人物角色后各模型会调整其风险画像。

详情
Comments
Poster presented at the AI for Finance Symposium '25, The 6th ACM International Conference on AI in Finance (ICAIF '25)
AI中文摘要

本文研究大型语言模型(LLMs)如何形成和表达投资者风险画像,这是零售投资咨询的关键组成部分。我们考察了三种LLM(GPT、Gemini和Llama),并评估了它们在提示词变化下对标准化风险问卷的回答。特别地,我们通过分析每个模型的重复回答来建立其默认投资画像。我们观察到,LLMs通常是长期投资者,但在风险容忍度上表现出不同倾向:Gemini具有中等风险水平且回答高度一致,Llama偏向保守,GPT则显得适度激进且答案变化最大。此外,我们发现赋予特定人物角色(如年龄、财富和投资经验)会导致每个LLM调整其风险画像,尽管这些调整的程度因模型而异。

英文摘要

This paper investigates how large language models (LLMs) form and express investor risk profiles, a critical component of retail investment advising. We examine three LLMs (GPT, Gemini, and Llama) and assess their responses to a standardized risk questionnaire under varying prompts. In particular, we establish each model's default investment profile by analyzing repeated responses per model. We observe that LLMs are generally longterm investors but exhibit different tendencies in risk tolerance: Gemini has a moderate risk level with highly consistent responses, Llama skews more conservative, and GPT appears moderately aggressive with the greatest variation in answers. Moreover, we find that assigning specific personas such as age, wealth, and investment experience leads each LLM to adjust its risk profile, although the extent of these adjustments differs across the models.

2603.09301 2026-05-28 q-fin.PM

Constructing a Portfolio Optimization Benchmark Framework for Evaluating Large Language Models

构建用于评估大语言模型的投资组合优化基准框架

Hanyong Cho, Jang Ho Kim

AI总结 本研究提出一个基准框架,通过具有数学显式解的投资组合优化问题评估大语言模型的金融决策能力,实验显示GPT-4在基于风险的目标上表现最佳,而Llama 3.1-70B整体性能最低。

详情
Comments
Poster presented at the AI for Finance Symposium '25, The 6th ACM International Conference on AI in Finance (ICAIF '25)
AI中文摘要

本研究引入了一个基准框架,通过具有数学显式解的投资组合优化问题来评估大语言模型(LLMs)的金融决策能力。与现有强调语言处理任务的金融基准不同,所提出的框架直接测试投资背景下的基于优化的推理。通过改变目标、候选资产和投资约束,生成了大量选择题,每个问题都设计有唯一正确解和系统构建的替代选项。比较GPT-4、Gemini 1.5 Pro和Llama 3.1-70B的实验结果揭示了不同的性能模式:GPT在基于风险的目标上达到最高准确率,并在约束下保持稳定;Gemini在基于回报的任务中表现良好,但在其他条件下挣扎;Llama的整体性能最低。这些发现突出了LLMs在将定量推理应用于金融方面的潜力和当前局限性,同时为开发基于LLMs的投资组合管理服务提供了可扩展的基础。

英文摘要

This study introduces a benchmark framework for evaluating the financial decision-making capabilities of large language models (LLMs) through portfolio optimization problems with mathematically explicit solutions. Unlike existing financial benchmarks that emphasize language-processing tasks, the proposed framework directly tests optimization-based reasoning in investment contexts. A large set of multiple-choice questions is generated by varying objectives, candidate assets, and investment constraints, with each problem designed to include a unique correct solution and systematically constructed alternatives. Experimental results comparing GPT-4, Gemini 1.5 Pro, and Llama 3.1-70B reveal distinct performance patterns: GPT achieves the highest accuracy in risk-based objectives and remains stable under constraints, Gemini performs well in return-based tasks but struggles under other conditions, and Llama records the lowest overall performance. These findings highlight both the potential and current limitations of LLMs in applying quantitative reasoning to finance, while providing a scalable foundation for developing LLM-based services in portfolio management.

2602.21869 2026-05-28 physics.soc-ph physics.app-ph physics.data-an q-fin.ST

A Bayesian approach to out-of-sample network reconstruction

一种贝叶斯方法用于样本外网络重构

Mattia Marzi, Tiziano Squartini

AI总结 提出贝叶斯方法,利用历史网络快照信息构建先验,预测未来网络结构并量化不确定性,在银行间存款市场中准确恢复连接数。

详情
Comments
26 pages, 13 figures
AI中文摘要

网络支撑着从金融到生物的系统,但其结构通常仅被部分观测。当前的重构方法通常为每个快照重新拟合模型参数,因此无法指导预测未来配置。在此,我们开发了一种贝叶斯方法,利用过去网络快照的信息来构建先验并预测后续快照,同时量化不确定性。我们的方法以单参数适应度模型实例化,从节点强度推断链接概率,并随时间向前传递信息。当应用于1999-2012年间的银行间存款电子市场时,我们的方法准确恢复了后续时间每个银行的连接数,优于为类似链接预测任务设计的概率基准。值得注意的是,每个预测快照都作为下一个快照的可靠先验,从而实现了以最少额外数据对演化网络进行自维持的样本外重构。

英文摘要

Networks underpin systems that range from finance to biology, yet their structure is often only partially observed. Current reconstruction methods typically fit the parameters of a model anew to each snapshot, thus offering no guidance to predict future configurations. Here, we develop a Bayesian approach that uses the information about past network snapshots to inform a prior and predict the subsequent ones, while quantifying uncertainty. Instantiated with a single-parameter fitness model, our method infers link probabilities from node strengths and carries information forward in time. When applied to the Electronic Market for Interbank Deposit across the years 1999-2012, our method accurately recovers the number of connections per bank at subsequent times, outperforming probabilistic benchmarks designed for analogous, link prediction tasks. Notably, each predicted snapshot serves as a reliable prior for the next one, thus enabling self-sustained, out-of-sample reconstruction of evolving networks with a minimal amount of additional data.

2602.18481 2026-05-28 q-fin.TR cs.AI

AlphaForgeBench: Benchmarking End-to-End Trading Strategy Design with Large Language Models

AlphaForgeBench:用大型语言模型对端到端交易策略设计进行基准测试

Wentao Zhang, Mingxuan Zhao, Jincheng Gao, Jieshun You, Huaiyu Jia, Yilei Zhao, Bo An, Shuo Sun

AI总结 提出AlphaForgeBench框架,将LLM从随机交易代理重新定义为量化研究员,通过生成可执行alpha因子和基于因子的交易策略,消除执行不稳定,实现可复现的金融推理评估。

详情
AI中文摘要

大型语言模型(LLMs)的快速发展催生了大量金融基准测试,从静态知识评估演变为交互式交易模拟。然而,现有的实时交易评估框架在很大程度上忽略了一个关键的失败模式:LLMs在金融不确定性下的序贯决策中表现出严重的行为不稳定性。通过大量实验,我们表明,当作为交易代理部署时,LLMs表现出极端的运行间方差,即使在确定性解码下也会产生不一致的动作序列,并且经常在相邻时间步产生不合理的动作翻转。我们将这些行为归因于LLMs的无状态自回归特性,它们缺乏对先前动作的持久记忆,以及它们对投资组合分配任务中连续到离散动作映射的敏感性。这些缺陷从根本上破坏了现有许多在线和离线交易基准的可靠性和可复现性。为了解决这些局限性,我们提出了AlphaForgeBench,一个原则性的评估框架,将LLMs重新定义为量化研究员而非随机交易代理。AlphaForgeBench不要求模型产生离散的交易动作,而是要求模型生成可执行的alpha因子,并基于金融知识构建基于因子的交易策略。这种范式将推理与执行机制解耦,实现了确定性和可复现的评估,同时与真实的量化研究工作流程保持一致。在多个最先进的LLM上进行的大量实验表明,AlphaForgeBench消除了执行引起的不稳定性,并为评估金融推理、策略制定和alpha发现提供了严格的基准。网页链接:https://finbrain-lab-hkustgz.github.io/AlphaForgeBench

英文摘要

The rapid advancement of Large Language Models (LLMs) has led to a surge of financial benchmarks, evolving from static knowledge evaluation toward interactive trading simulations. However, existing frameworks for evaluating real-time trading largely overlook a critical failure mode: the severe behavioral instability of LLMs in sequential decision-making under financial uncertainty. Through extensive experiments, we show that when deployed as trading agents, LLMs exhibit extreme run-to-run variance, generate inconsistent action sequences even under deterministic decoding, and frequently produce irrational action flipping across adjacent time steps. We attribute these behaviors to the stateless autoregressive nature of LLMs, which lack persistent memory of prior actions, together with their sensitivity to continuous-to-discrete action mappings in portfolio allocation tasks. These deficiencies fundamentally undermine the reliability and reproducibility of many existing online and offline trading benchmarks. To address these limitations, we propose AlphaForgeBench, a principled evaluation framework that redefines LLMs as quantitative researchers rather than stochastic trading agents. Instead of producing discrete trading actions, AlphaForgeBench requires models to generate executable alpha factors and compose factor-based trading strategies grounded in financial knowledge. This paradigm decouples reasoning from execution mechanics, enabling deterministic and reproducible evaluation while remaining aligned with real-world quantitative research workflows. Extensive experiments across multiple state-of-the-art LLMs demonstrate that AlphaForgeBench eliminates execution-induced instability and provides a rigorous benchmark for evaluating financial reasoning, strategy formulation, and alpha discovery. Webpage at https://finbrain-lab-hkustgz.github.io/AlphaForgeBench

2601.13493 2026-05-28 math.OC math.FA math.PR q-fin.MF

Infinite-Dimensional LQ Mean Field Games with Common Noise: Small and Arbitrary Finite Time Horizons

带共同噪声的无限维LQ平均场博弈:小时间区间与任意有限时间区间

Hanchao Liu, Dena Firoozi

AI总结 针对带共同噪声的无限维线性二次平均场博弈,通过耦合的前向-后向随机演化方程建立存在唯一性,证明小时间区间下均衡策略的ε-纳什性质,并推广到任意有限时间区间。

详情
Comments
27 pages
AI中文摘要

我们发展了希尔伯特空间中带共同噪声的线性二次平均场博弈理论,其中共同噪声由影响所有智能体动态的无限维维纳过程建模。在共同噪声存在时,平均场一致性条件由希尔伯特空间中的耦合前向-后向随机演化方程组刻画,而在无共同噪声时则由耦合前向-后向确定性演化方程表示。我们建立了小时间区间下与LQ MFG框架相关的耦合线性FBSEE解的存在唯一性,并证明了所得均衡策略的ε-纳什性质。此外,我们建立了这些耦合线性FBSEE在任意有限时间区间上的适定性。超越MFG的具体背景,我们的分析还提供了一个更广泛的贡献:据我们所知,首次给出了一类无限维线性FBSEE在任意有限时间区间上的适定性结果,该类方程仅存在温和解。

英文摘要

We develop the theory of linear-quadratic (LQ) mean field games (MFGs) in Hilbert spaces with common noise modeled by an infinite-dimensional Wiener process that affects the dynamics of all agents. In the presence of common noise, the mean-field consistency condition is characterized by a system of coupled forward-backward stochastic evolution equations (FBSEEs) in Hilbert spaces, whereas, in its absence it is represented by coupled forward-backward deterministic evolution equations. We establish the existence and uniqueness of solutions to the coupled linear FBSEEs associated with the LQ MFG framework for small time horizons and prove the $ε$-Nash property of the resulting equilibrium strategy. Furthermore, we establish the well-posedness of these coupled linear FBSEEs for arbitrary finite time horizons. Beyond the specific context of MFGs, our analysis also yields a broader contribution by providing, to the best of our knowledge, the first well-posedness result for a class of infinite-dimensional linear FBSEEs, for which only mild solutions exist, over arbitrary finite time horizons.