Competition in Dealer Markets with Internalisation and Externalisation
具有内部化和外部化的经销商市场竞争
AI总结 本文通过变分方法推导了多个经销商在动态报价竞争中的纳什均衡闭式解,揭示了内部化与外部化策略对市场价差和经销商对冲成本的影响。
具有内部化和外部化的经销商市场竞争
Robert Boyce, Eyal Neuman
AI总结 本文通过变分方法推导了多个经销商在动态报价竞争中的纳什均衡闭式解,揭示了内部化与外部化策略对市场价差和经销商对冲成本的影响。
我们建模了一个由多个经销商组成的市场,这些经销商通过动态更新风险资产的买入和卖出报价来竞争客户订单流。经销商旨在最大化预期利润,同时通过调整报价以吸引抵消订单流(内部化)或直接在市场上卸货(外部化)来控制库存风险。使用变分方法,我们推导了由此产生的纳什竞争的闭式均衡,揭示了经销商市场动态的关键特征。我们表明,依赖内部化的经销商在与外部化经销商竞争时被迫增加其外部化活动。这种均衡中的战略转变导致所有经销商的对冲成本显著增加,并且客户面临更宽的价差。
We model a market with multiple dealers who compete for client order flow by dynamically updating their bid and ask quotes for a risky asset. Dealers aim to maximise expected profits while controlling inventory risk by skewing their quotes to attract offsetting order flow (internalisation) or by directly offloading positions in the market (externalisation). Using a variational approach, we derive a closed-form equilibrium for the resulting Nash competition, shedding light on key features of dealer market dynamics. We show that dealers relying on internalisation are compelled to increase their externalisation activity when competing with externalising dealers. This strategic shift in equilibrium leads to significantly higher hedging costs for all dealers and substantially wider spreads for clients.
多尺度马尔可夫转换GARCH
Jayesh Chaudhary
AI总结 针对金融波动非平稳性,提出三重时间框架的MS-GARCH模型,通过独立估计日、4小时、小时三个时间尺度的AR(1)-MS-GARCH并利用Filardo时变转移概率和复合压力指标,构建27状态跨尺度概率张量,在EUR/USD数据上优于传统GARCH的波动率预测。
金融波动表现出显著的非平稳性,使得单机制模型不足以描述变化的市场条件。本文提出一个三重时间框架的马尔可夫转换GARCH(MS-GARCH)框架,用于检测欧元/美元在日、四小时和小时间隔上的波动率机制。估计了三个独立的AR(1)-MS-GARCH模型以捕捉宏观、中观和微观机制动态,同时在较短时间尺度上通过复合压力指标引入Filardo风格的时变转移概率(TVTP)。得到的机制概率通过外积构造组合成一个27状态跨尺度概率张量。使用2015-2025年的欧元/美元数据,该框架产生了统计上显著的平静、动荡和危机机制,并且相对于传统GARCH基准实现了更优的样本外波动率预测性能。结果表明,波动率动态在多个时间尺度上包含有意义的结构,并且分别对这些尺度进行建模比单一时间尺度方法能提供更具信息性的市场条件表示。
Financial volatility exhibits substantial non-stationarity, making single-regime models inadequate for characterising changing market conditions. This paper proposes a triple-timeframe Markov-Switching GARCH (MS-GARCH) framework for volatility regime detection in EUR/USD across daily, four-hour, and hourly horizons. Three independent AR(1)-MS-GARCH models are estimated to capture macro, meso, and micro regime dynamics, while Filardo-style time-varying transition probabilities (TVTP) are incorporated at the shorter horizons through composite stress indicators. The resulting regime probabilities are combined through an outer-product construction into a 27-state cross-scale probability tensor. Using EUR/USD data from 2015-2025, the framework produces statistically distinct Calm, Turbulent, and Crisis regimes and achieves superior out-of-sample volatility forecasting performance relative to a conventional GARCH benchmark. The results suggest that volatility dynamics contain meaningful structure across multiple timescales and that modelling these scales separately provides a more informative representation of market conditions than a single-timescale approach.
利用大语言模型进行非结构化索赔数据分析
Robert D. Lieberthal, Richard Tran, Vietbao Phan, Jawand Singh, Elizabeth Sottung
AI总结 提出一个两阶段处理框架,利用大语言模型从非结构化索赔数据中提取结构化精算变量,并通过链梯法准备金验证其实际价值。
精算师主要依赖结构化数值数据进行准备金和费率制定,而非结构化文本(包括医疗记录、理赔员笔记和通话记录)中包含的有价值预测信息大多未被使用。手动处理这些文档耗时、跨审查员不一致且不可扩展。我们提出了一个概念验证框架,使用大语言模型(LLMs)从非结构化索赔数据中提取结构化精算变量。我们实现了一个两阶段处理架构,将文档级提取(阶段1)与索赔级综合(阶段2)分开。一个模块化的四脚本Python管道处理基于FHIR的合成索赔数据和真实索赔文档,提取了涵盖准备金、费率制定和索赔管理类别的36个精算变量。我们使用两名独立临床专家审查员对20个合成索赔进行五点Likert评分,验证了14个核心变量,平均得分超过4.0,加权kappa为0.53。与链梯法准备金的集成展示了实际精算价值:严重程度分段分析将准备金估计误差从6.5%降低到4.0%。开源实现包括审计轨迹和置信度评分,为财产险中基于LLM的精算变量提取提供了可复现的基础。
Actuaries rely primarily on structured numerical data for reserving and ratemaking, while valuable predictive information in unstructured text including medical records, adjuster notes, and call transcripts remains largely unused. Manual processing of these documents is time-consuming, inconsistent across reviewers, and unscalable. We present a proof-of-concept framework using large language models (LLMs) to extract structured actuarial variables from unstructured claims data. We implement a two-stage processing architecture separating document-level extraction (Stage 1) from claim-level synthesis (Stage 2). A modular four-script Python pipeline processes synthetic FHIR-based claims data and real claims documents, extracting 36 actuarial variables across reserving, ratemaking, and claims management categories. We validate 14 core variables using two independent clinical expert reviewers scoring 20 synthetic claims on a five-point Likert rubric, achieving mean scores above 4.0 and a weighted kappa of 0.53. Integration with chain ladder reserving demonstrates practical actuarial value: severity-segmented analysis reduced reserve estimation error from 6.5% to 4.0%. The open-source implementation includes audit trails and confidence scoring, providing a replicable foundation for LLM-based actuarial variable extraction in property-casualty insurance.
电力市场波动率与风险溢价的预测
Thomas K. Kloster, Fred Espen Benth
AI总结 研究电力市场已实现协方差的预测,通过构建简约矩阵HAR模型,发现纳入更长的时间跨度和可再生能源发电信息能显著提升预测能力,并利用方差预测改进远期市场价差风险溢价的预测。
我们研究电力市场中已实现协方差的预测。在此背景下,已实现协方差是潜在无限维协方差算子的矩阵值表示,并构建了一个简约的矩阵HAR型模型以方便估计。我们在周度已实现协方差的一周前预测上测试该模型,发现纳入更长的时间跨度和可再生能源发电信息增加了重要的预测能力。我们还研究了电力远期市场中风险溢价的预测,发现与依赖回溯波动率的传统方法相比,我们的方差预测显著改进了价差风险溢价的预测。
We study forecasting of the realized covariation in electricity markets. The realized covariation in this context is a matrix-valued representation of the latent infinite-dimensional covariance operator and a parsimonious matrix-HAR type model is constructed to facilitate estimation. We test the model on one-week ahead forecasts of the weekly realized covariation and find that the inclusion of longer time horizons and renewable generation information adds important predictive power. We also investigate the prediction of risk premia in electricity forward markets and find that our variance forecasts provide substantially improved forecasts of spread risk premia compared to standard methods relying on backward looking volatility.
基于导数信息的金融算子学习:即时希腊值、曲面、对冲与控制
Miquel Noguer I Alonso
AI总结 提出一种导数信息算子学习框架,通过同时匹配高保真定价/风险算子及其方向性Fréchet导数,训练神经算子、随机特征算子或有限维代理模型,以提升金融决策系统中衍生品敏感度(如delta、vega)的精度,并给出误差界及实验验证。
金融决策系统需要用于定价、校准、对冲、XVA、压力测试和投资组合优化的快速代理模型。标准神经代理模型可复现价格或风险量,但下游任务同样依赖于导数:delta、vega、曲线和信用利差敏感性、敞口和目标梯度。我们制定了一个基于导数信息的算子学习框架,其中学习的映射——神经算子、随机特征算子或有限维代理模型——被训练为同时匹配高保真定价或风险算子,并匹配即时生成的定向Fréchet导数。该框架结合了算子学习、伴随算法微分、切向灵敏度方程、Jacobian作用的随机草图和无套利约束。我们推导了误差界,表明导数精度控制局部应力误差、对冲误差和优化器不稳定性,并且离散时间对冲误差也受二阶(gamma)精度支配。在八个种子上的Black-Scholes网络显示,调整后的导数权重将vega误差降低40%,delta误差降低15%,同时适度改善价格,但非无监督二阶希腊值。Heston和Bates随机特征实验将随机波动率和跳跃参数灵敏度误差降低60-76%。一个将瞬时波动率曲线映射到密集价格曲面的随机特征DeepONet/Galerkin算子,在八个种子上将样本外JVP误差降低44%,价格RMSE降低23%;它还表明仅靠导数一致性并不能消除无套利违规,因此必须显式施加经济约束。该框架提供了一条从仅价值代理到导数感知引擎的规范化路径,该引擎输出用于对冲、风险和控制的微分工具。
Financial decision systems require fast surrogate models for pricing, calibration, hedging, XVA, stress testing, and portfolio optimization. Standard neural surrogates reproduce prices or risk quantities, but downstream tasks depend as much on derivatives: deltas, vegas, curve and credit-spread sensitivities, exposure and objective gradients. We formulate a derivative-informed operator-learning framework in which the learned map -- a neural operator, random-feature operator, or finite-dimensional surrogate -- is trained both to match a high-fidelity pricing or risk operator and to match directional Fréchet derivatives generated on the fly. The framework combines operator learning, adjoint algorithmic differentiation, tangent sensitivity equations, random sketching of Jacobian actions, and no-arbitrage constraints. We derive error bounds showing derivative accuracy controls local stress errors, hedging error, and optimizer instability, and that discrete-time hedging error is also governed by second-order (gamma) accuracy. A Black--Scholes network over eight seeds shows a tuned derivative weight cuts vega error by 40\% and delta error by 15\% while modestly improving prices, but not an unsupervised second-order Greek. Heston and Bates random-feature experiments reduce stochastic-volatility and jump-parameter sensitivity errors by 60--76\%. A random-feature DeepONet/Galerkin operator mapping instantaneous-volatility curves to dense price surfaces reduces out-of-sample JVP error by 44\% and price RMSE by 23\% over eight seeds; it also shows derivative consistency alone does not remove no-arbitrage violations, so economic constraints must be imposed explicitly. The framework gives a disciplined route from value-only surrogates to derivative-aware engines that output differentiable instruments for hedging, risk, and control.
市场知情度对做市商盈利能力的影响
Konrad Ochędzan, Nino Antulov-Fantulin
AI总结 本文通过多智能体强化学习框架研究市场知情度对做市商盈利能力的影响,发现知情订单流在低知情市场中导致严重逆向选择风险,但整体上市场知情度提高带来的价格发现效应抵消了逆向选择的负面影响,使做市商盈利能力呈上升趋势。
本文研究了市场知情度对做市商盈利能力的影响。与现有文献不同,分析是在一个复杂的市场环境中进行的,该环境具有异质性的做市代理,它们在信息集和库存风险厌恶程度、内生价格形成、外生基本面价值动态以及自激励的市场订单流方面存在差异。本文还为由此产生的状态依赖的霍克斯市场接受者过程建立了有限时间范围内的稳定性保证,包括非爆炸性、指数级错误定价可积性、占用时间界限以及路径wise的错误定价尾部估计。为了解决做市问题,该研究采用了一种基于多智能体近端策略优化(MAPPO)算法的强化学习框架,该框架采用集中训练与分散执行(CTDE)设置。研究表明,知情市场订单流在低知情市场中尤其危险,导致严重的逆向选择风险。尽管复杂的市场动态加上随机训练导致了局部非单调的结果,但结果仍然揭示了做市商盈利能力随着市场知情度的提高而整体上升的趋势,这表明由更高市场知情度带来的价格发现效应抵消了逆向选择的负面影响。
This paper examines the impact of market informedness on the profitability of market makers. In contrast to the existing literature, the analysis is conducted in a complex market environment featuring heterogeneous market-making agents that differ in terms of information sets and aversion to inventory risk, endogenous price formation, exogenous fundamental value dynamics, and self-exciting market order flow. The paper also establishes finite-horizon stability guarantees for the resulting state-dependent Hawkes market-taker process, including non-explosion, exponential mispricing integrability, occupation-time bounds, and a pathwise mispricing tail estimate. To address the market-making problem, the study employs a reinforcement learning framework based on the multi-agent proximal policy optimization (MAPPO) algorithm in a centralized training with decentralized execution (CTDE) setting. The study shows that informed market order flow is particularly dangerous in poorly informed markets, leading to severe adverse-selection risk. Although the complex market dynamics together with stochastic training give rise to locally non-monotonic outcomes, the results nevertheless reveal an overall upward trend in market makers' profitability as market informedness increases, suggesting that price discovery resulting from higher market informedness offsets the negative impact of adverse selection.
零拷贝语义传染:一种用于演化注意力图的内存流式架构
Kabir Murjani
AI总结 提出一种基于Rust-Python的异构流式架构,通过零拷贝解析和神经霍克斯过程实现跨公司注意力图的实时构建与推理,在FNSPID语料库上相比随机基线提升1.70倍精度。
按代码预测模型主导金融时间序列工作,但仍无法捕捉跨公司传播:台湾的晶圆厂中断在单资产模型中不会显现,直到苹果自己的价格已经变动。为解决这一局限,我们引入一种异构的Rust-Python流式架构,将跨公司注意力映射为直接由文本驱动的连续时间图。我们表明,在摄取端,零拷贝Rust边缘解析新闻记录约需100纳秒,并在约1.2微秒内扫描目标股票宇宙。在推理端,一个多变量神经霍克斯过程,具有每节点连续时间LSTM状态和双线性潜在投影,传播定向激发,而自适应剪枝规则限制了动态邻域更新的计算成本。结合这些阶段,我们展示了在单个商用CPU上,每条传入新闻记录的端到端处理延迟约为13毫秒。在FNSPID语料库(47个代码的638篇文章)的一个月时间保持集上评估,该系统在90百分位次日回报阈值下,相比随机基线精度提升1.70倍,相比同行业基线提升3.36倍。关键的是,移除图拓扑结构会使精度降至零,证实动态注意力网络是该架构中跨公司信号的唯一驱动因素。
Per-ticker forecasting models dominate financial time-series work yet remain blind to cross-company propagation: a foundry disruption in Taiwan does not register in a single-asset model until Apple's own price has already moved. To address this limitation, we introduce a heterogeneous Rust-Python streaming architecture that maps cross-company attention as a continuous-time graph driven directly from text. We show that on the ingestion side, a zero-copy Rust edge parses news records in $\sim$100 ns and scans the target equity universe in $\sim$1.2 $μ$s. On the inference end, a multivariate Neural Hawkes Process featuring per-node continuous-time LSTM states and a bilinear latent projection propagates directed excitation, while an adaptive pruning rule bounds the computational cost of dynamic neighborhood updates. Combining these stages, we demonstrate an end-to-end processing latency of $\sim$13 ms per incoming news record on a single commodity CPU. Evaluated on a one-month temporal holdout of the FNSPID corpus (638 articles across 47 tickers), the system delivers a $1.70\times$ precision lift over random at the 90th-percentile next-day return threshold, and $3.36\times$ over a same-sector baseline. Crucially, removing the graph topology collapses precision to zero, confirming that the dynamic attention network is the sole driver of cross-company signal in this architecture.
压力放大韧性:ESG与股票市场的联合脆弱性
Minxuan Hu, Jiayu Yi, Ziheng Chen, Wenxi Sun, Qishi Zhan
AI总结 本文通过分析2014-2025年标普500成分股数据,研究ESG是否与较低的市场联合脆弱性(下行收益、波动性、非流动性同时发生)相关,发现ESG在压力时期通过多通道放大韧性,而非提供无条件溢价。
市场压力很少通过单一渠道损害投资者。损失、波动性飙升和交易性恶化往往同时发生。我们检验ESG是否与较低的股票市场聚类脆弱性暴露相关。使用2014年至2025年标普500成分股的月度数据,我们研究下行收益、波动性、非流动性以及一个捕捉它们在同一公司月份内同时发生的共脆弱性状态。证据支持压力放大韧性的解释,而非无条件的ESG回报溢价。在回报渠道中,ESG关联集中在压力月份的极端下行尾部。在波动性渠道中,较高的ESG与整体条件疲弱时较小的风险飙升相关。在非流动性渠道中,关联更为持久,表明流动性质量成分的相关性在市场整体交易条件恶化时增加。核心证据来自联合分析:ESG增加一个标准差,将压力时期严重共脆弱性的概率降低0.92个百分点,相对于基线约9%。双机器学习在灵活调整可观测公司特征后显示类似的负ESG关联。支柱证据表明,环境得分具有更强的基线韧性,而社会得分具有更清晰的压力放大效应。总体而言,这些发现将ESG描述为用于尾部风险监控、压力分析和支柱级ESG评估的多通道脆弱性信号。
Market stress rarely harms investors through one channel alone. Losses, volatility spikes, and deteriorating tradability often arrive together. We examine whether ESG is associated with lower exposure to clustered fragility in equity markets. Using monthly data on S&P 500 constituents from 2014 to 2025, we study downside returns, volatility, illiquidity, and a cofragility state that captures their joint occurrence within the same firm month. The evidence supports a stress-amplified resilience interpretation rather than an unconditional ESG return premium. In the return channel, the ESG association is concentrated in the extreme downside tail during stress months. In the volatility channel, higher ESG is associated with smaller risk spikes when aggregate conditions are weak. In the illiquidity channel, the association is more persistent, suggesting a liquidity-quality component whose relevance increases when market-wide trading conditions deteriorate. The central evidence comes from the joint analysis: a one-standard-deviation increase in ESG lowers the stress-period probability of severe cofragility by 0.92 percentage points, about 9% relative to the baseline. Double Machine Learning shows a similar negative ESG association after flexible adjustment for observable firm characteristics. Pillar evidence suggests stronger baseline resilience for Environmental scores and clearer stress amplification for Social scores. Overall, the findings characterize ESG as a multi-channel fragility signal for tail-risk monitoring, stress analysis, and pillar-level ESG assessment.
基于10-K叙述的破产预测:来自可解释文本分数与会计基线的证据
Zhen Zhang, Moxuan Zheng, Tongchen Zhang, Luyun Lin, Yiqing Wang, Lixing Lin
AI总结 本文通过构建可解释的破产前压力分数,验证了10-K叙述文本在传统会计变量之外对破产预测的增量信息,显著提升了AUC和顶部十分位破产捕获率。
破产是一种低频率但高影响的企业事件,因此早期风险识别对债权人、投资者、监管者和风险管理者至关重要。传统的破产预测模型主要依赖会计比率,但这些指标可能仅在财务恶化出现在已报告的财务报表中时才反映出来。因此,年度10-K文件中的叙述性披露可能提供关于新兴困境的增量预警信号。本研究考察了10-K叙述是否能在传统会计变量之外改进破产预测。使用与10-K文本、SEC财务报表数据以及来自佛罗里达-加州大学洛杉矶分校-LoPucki破产研究数据库的破产事件匹配的公司年度观测值,分析评估了10-K提交日期后一年内的破产风险。本文开发了一个透明的破产前压力分数,这是一种基于词典的度量,旨在捕捉与流动性和资金压力、债务契约和再融资压力、经营恶化、重组和法律困境以及业务脆弱性相关的困境特定语言。该分数与一个五变量会计基线和Loughran-McDonald词典基准进行了评估。在主要的一年期保留样本测试中,添加破产前压力分数使AUC从0.8323提高到0.9019,并将顶部十分位破产捕获率从44.12%提高到64.71%。正向增量模式在bootstrap推断、替代会计基准、替代结果定义和时段外验证中仍然可见。研究结果表明,困境特定的10-K叙述为破产风险监测提供了超越传统会计比率的可解释增量信息。
Bankruptcy is a low-frequency but high-impact corporate event, making early risk identification important for creditors, investors, regulators, and risk managers. Traditional bankruptcy-prediction models rely primarily on accounting ratios, but these measures may reflect financial deterioration only after it appears in reported financial statements. Narrative disclosures in annual 10-K filings may therefore provide incremental warning signals about emerging distress. This study examines whether 10-K narratives improve bankruptcy prediction beyond conventional accounting variables. Using firm-year observations matched to 10-K text, SEC financial statement data, and bankruptcy events from the Florida-UCLA-LoPucki Bankruptcy Research Database, the analysis evaluates bankruptcy risk over the year following the 10-K filing date. The paper develops a transparent Pre-Bankruptcy Stress (PB Stress) Score, a dictionary-based measure designed to capture distress-specific language related to liquidity and funding stress, debt covenant and refinancing stress, operating deterioration, restructuring and legal distress, and business fragility. The score is evaluated against a five-variable accounting baseline and a Loughran-McDonald dictionary benchmark. In the primary one-year holdout test, adding the PB Stress Score increases AUC from 0.8323 to 0.9019 and raises top-decile bankruptcy capture from 44.12% to 64.71%. The positive incremental pattern remains visible across bootstrap inference, alternative accounting benchmarks, alternative outcome definitions, and out-of-time validation. The findings indicate that distress-specific 10-K narratives provide interpretable incremental information for bankruptcy-risk monitoring beyond conventional accounting ratios.
稳健风险度量与不确定性集的对偶表示
Marlon R. Moresco, Marcelo Righi, Silvana M. Pesenti
AI总结 本文研究由凸风险度量在不确定性集上的最坏情况值定义的稳健风险度量,通过整合不确定性集刻画其连续性,推导对偶表示,并建立整合不确定性集的集值对偶表示。
我们考虑稳健风险度量,这些度量是凸风险度量在不确定性集上评估的最坏情况值。我们通过其整合不确定性集刻画稳健风险度量的连续性性质,推导稳健风险度量的对偶表示,并为整合不确定性集发展了一个集值对偶表示。这两个对偶框架依赖于不同的几何假设,因此是互补的而非可互换的。
We consider robust risk measures that arise as worst-case values of convex risk measures evaluated on uncertainty sets. We characterize continuity properties of robust risk measures through their consolidated uncertainty sets, derive dual representations for robust risk measures, and develop a set-valued dual representation for consolidated uncertainty sets. The two dual frameworks rely on distinct geometric assumptions and are therefore complementary rather than interchangeable.
AI能否反驳经济理论?来自知识截止日期之外的证据
Alexis Akira Toda
AI总结 本文通过实验测试多个AI模型(Gemini、Refine、Claude和ChatGPT)检查四篇包含错误的经济理论论文,发现ChatGPT Pro表现最佳但无法独立发现错误,表明AI尚不能自主反驳经济理论。
人工智能(AI)能否反驳经济理论?我记录了实验,其中我要求几个AI模型(Gemini、Refine、Claude和ChatGPT)检查四篇已发表的经济理论论文的正确性,每篇论文都包含一个我帮助识别或纠正的错误。ChatGPT Pro表现最佳,偶尔构建反例并纠正证明,而其他模型表现较差。然而,没有模型能在没有大量人工指导的情况下找到真正的错误,数据污染使解释复杂化。我认为,一个有能力的人类与前沿模型配对可以超越当前的同行评审,但AI尚不能独立反驳经济理论。
Can artificial intelligence (AI) refute economic theory? I document experiments in which I asked several AI models (Gemini, Refine, Claude, and ChatGPT) to check the correctness of four published papers in economic theory, each containing an error that I helped identify or correct. ChatGPT Pro performed best, occasionally constructing counterexamples and corrected proofs, while other models fared worse. However, no model located a true error without substantial human guidance, and data contamination complicates interpretation. I argue that a competent human paired with a frontier model can outperform current peer review, but AI cannot yet refute economic theory on its own.
PortBench: 一种相关性感知的、全流水线的LLM驱动投资组合管理基准
Yuxuan Zhao, Sijia Chen, Ningxin Su
AI总结 提出PortBench基准,通过静态QA和动态五阶段分配流水线评估LLM在投资组合管理中的表现,发现多数模型无法超越等权重分配,且存在推理错误累积和压力下大幅回撤的问题。
LLMs在多种金融任务中表现出色,但投资组合管理(PM)这一关键金融决策任务仍缺乏良好基准。现有基准存在两个主要缺陷:忽略跨资产相关性结构,从而无法区分真正多样化的投资组合与集中投资组合;未能评估真实场景中完整的PM决策流水线。我们提出PortBench,一个涵盖十年间六类异质资产类别的基准。PortBench由两个互补层组成:包含6269个基于相关性的问题(覆盖七个任务模板)的静态QA数据集,以及模拟完整PM决策周期的动态五阶段分配流水线。为评估这些层,我们引入两个专用指标:双层次相关性分数,衡量所提投资组合是否利用跨类别对冲并避免类别内集中;以及CEPS,量化推理错误如何在流水线阶段间累积。我们进一步在三种历史压力情景和风险配置下评估策略稳健性和投资者对齐。评估十个前沿LLM,我们发现尽管在静态金融QA上表现强劲,90%的模型-配置组合未能超越基本的等权重分配,且满足所有程序约束的模型在压力下仍遭受灾难性回撤。我们的源代码可在\href{https://github.com/AgenticFinLab/portbench}{此https URL}获取。
Large language models (LLMs) have shown strong performance across diverse financial tasks, yet portfolio management (PM), a critical financial decision-making task, remains poorly benchmarked. Existing benchmarks exhibit two main gaps: they ignore cross-asset correlation structures, thereby failing to distinguish genuinely diversified portfolios from concentrated ones, and fail to evaluate the complete PM decision pipeline in real-world scenarios. We introduce PortBench, a benchmark spanning six heterogeneous asset classes over ten years. PortBench consists of two complementary layers: a static QA dataset of 6,269 correlation-based questions across seven task templates, and a dynamic five-stage allocation pipeline that mirrors the full PM decision cycle. To evaluate these layers, we introduce two dedicated metrics: a dual-layer correlation score that measures whether proposed portfolios exploit inter-class hedging and avoid intra-class concentration, and CEPS, a metric that quantifies how reasoning errors compound across pipeline stages. We further assess strategy robustness and investor alignment under three historical stress regimes and risk profiles. Evaluating ten frontier LLMs, we find that despite strong performance on static financial QA, 90\% of model-profile combinations fail to outperform a basic equal-weight allocation, and models that satisfy every procedural constraint still suffer catastrophic drawdowns under stress. Our source code is available at \href{https://github.com/AgenticFinLab/portbench}{this https URL}.
在危机后时代跨挪威五个竞价区的电力价格预测
My Thi Diem Phan, Trung Tuyen Truong, Hoai Phuong Ha, Dat Thanh Nguyen
AI总结 本文研究了挪威五个竞价区在能源危机后电力价格预测的问题,通过构建多模态数据集并评估了八种预测模型,发现LightGBM在所有区域表现最佳,同时强调了外部特征在不同市场状况下的重要性。
挪威的电力市场长期以来由水电主导,但2021-2022年的能源危机和与欧洲大陆的更强整合已从根本上改变了价格形成机制,降低了基于历史数据校准的预测模型的可靠性。尽管需要更新的模型,但缺乏一个统一的基准来评估所有结构各异的挪威竞价区的特征贡献。本文提出了对Nord Pool市场在所有五个挪威竞价区的一步预测的全面评估。我们构建了一个覆盖2019-2025年的多模态小时数据集,并使用严格因果测试集评估了八种预测模型家族,包括Light Gradient Boosting Machine(LightGBM)、带有外生变量的自回归模型和先进的深度学习架构。我们实现了稳健的滚动起源回测、留一组法特征消融和条件制度分析来分解模型性能和特征效用。我们的结果表明,LightGBM在每个区域都表现最佳,平均绝对误差范围为1.60至5.58欧元每兆瓦时,而一个带有外生变量的岭正则化自回归模型在北部区域仍然是一个高度有竞争力的线性基准。特征消融揭示了仅依赖滞后价格和日历变量的模型能够获得高精度,通常与完整的多模态模型的性能相匹配或接近。然而,条件制度分析显示,外部特征如水库水位和天然气价格在分层预测误差方面至关重要,这些误差在压力市场制度下持续增加。这突显了模型可解释性和制度意识在决策者面对市场动态结构性变化时的实用价值。
Norway's electricity market is heavily dominated by hydropower, but the 2021-2022 energy crisis and stronger integration with Continental Europe have fundamentally altered price formation, reducing the reliability of forecasting models calibrated on historical data. Despite the critical need for updated models, a unified benchmark evaluating feature contributions across all structurally diverse Norwegian bidding zones remains lacking. Here we present a comprehensive evaluation of one-step-ahead forecasting of the Nord Pool market across all five Norwegian bidding zones. We constructed a multimodal hourly dataset spanning 2019-2025 and evaluated eight forecasting model families, including Light Gradient Boosting Machine (LightGBM), autoregressive models with exogenous variables, and advanced deep learning architectures, using a strictly causal test set. We implemented robust rolling-origin backtesting, leave-one-group-out feature ablation, and conditional regime analysis to dissect model performance and feature utility. Our results show that LightGBM achieves the best performance in every zone, with mean absolute error ranging from 1.60 to 5.58 euros per megawatt-hour, while a ridge-regularized autoregressive model with exogenous variables remains a highly competitive linear benchmark in northern zones. Feature ablation reveals that models relying solely on lagged prices and calendar variables achieve high accuracy and often match or closely approach the performance of the full multimodal model. However, conditional regime analysis demonstrates that external features like reservoir levels and gas prices remain crucial to stratify forecast errors, which consistently increase under stressed market regimes. This highlights the practical value of model interpretability and regime awareness for decision makers facing structural changes in market dynamics.
方向性位移狄利克雷ARMA模型用于具有结构性断裂干预的组成时间序列
Harrison Katz
AI总结 本文提出了一种基于方向性位移干预机制的贝叶斯狄利克雷ARMA模型,用于处理具有结构性断裂的组成时间序列,通过三个可解释参数捕捉结构性断裂,并在不同场景下验证了模型的鲁棒性和预测性能。
组成时间序列经常由于外部冲击、政策变化或市场中断而出现结构性断裂。标准方法要么忽略这些断裂,要么通过固定效应或阶梯函数哑变量来处理,但这些方法无法超出样本范围进行外推或强制即时调整。我们开发了一种贝叶斯狄利克雷ARMA模型,结合了方向性位移干预机制,通过三个可解释参数捕捉结构性断裂:方向向量指定哪些成分增减份额,幅度控制再分配幅度,逻辑门控制转换时间和速度。该模型通过构造保持组成约束,维持DARMA动态以捕捉短期依赖性,并通过结构性断裂前后产生一致的概率预测。干预轨迹对应于简单形上的测地运动,并且不依赖于ILR基底的选择。通过400次拟合和8种场景的模拟研究,当位移方向正确识别时,近零幅度偏差和名义80%可信区间覆盖率(77.5%的案例)得到验证。补充研究证实了在极端转换速度和非单调DGPs下的鲁棒性。两个实证应用分析了新冠时期Airbnb数据的表现,与更简单的替代方法相比,当断裂是单调且持续时,干预模型达到近名义校准(79.6%),而固定效应显著低估(66.1%)。当断裂后动态是非单调时,两种模型都可接受校准,但固定效应在点准确性上表现更好。因此,干预模型的优势特定于具有大致单调结构性过渡的设置。
Compositional time series frequently exhibit structural breaks due to external shocks, policy changes, or market disruptions. Standard methods either ignore such breaks or handle them through fixed effects that cannot extrapolate beyond the sample, or step-function dummies that impose instantaneous adjustment. We develop a Bayesian Dirichlet ARMA model augmented with a directional-shift intervention mechanism that captures structural breaks through three interpretable parameters: a direction vector specifying which components gain or lose share, an amplitude controlling redistribution magnitude, and a logistic gate governing transition timing and speed. The model preserves compositional constraints by construction, maintains DARMA dynamics for short-run dependence, and produces coherent probabilistic forecasts through and after structural breaks. The intervention trajectory corresponds to geodesic motion on the simplex and is invariant to the choice of ILR basis. A simulation study with 400 fits across 8 scenarios shows near-zero amplitude bias and nominal 80\% credible interval coverage when the shift direction is correctly identified (77.5\% of cases); supplementary studies confirm robustness across extreme transition speeds and non-monotone DGPs. Two empirical applications to COVID-era Airbnb data characterize performance relative to simpler alternatives. Where the break is monotone and ongoing, the intervention model achieves near-nominal calibration (79.6\%) while the fixed effect substantially under-covers (66.1\%). Where post-break dynamics are non-monotone, both models are acceptably calibrated and the fixed effect outperforms on point accuracy. The intervention model's advantages are thus specific to settings with roughly monotone structural transitions.
FDI与研发在内生增长模型中的对比
Thanh Tam Nguyen-Huu, Ngoc-Sang Pham
AI总结 本文研究了FDI和研发在主机国过渡动态中的作用,通过最优增长模型分析发现,仅依赖FDI可能导致中等收入陷阱,而投资研发可实现持续增长,FDI在早期发展阶段对主机国有帮助。
我们通过一个最优增长模型研究了外国直接投资(FDI)和研发(R&D)在主机国过渡动态中的作用。FDI可能通过使跨国公司雇佣本地工人来增加主机国的GNP。然而,如果主机国仅依赖FDI,可能会陷入中等收入陷阱。最重要的是,我们证明如果主机国投资于R&D,其经济可以实现持续增长。在这种情况下,FDI对主机国有帮助,但仅在发展过程的早期阶段。
We investigate the role of foreign direct investment (FDI) and research and development (R\&D) in the transitional dynamics of host countries using an optimal growth model. FDI may benefit the host country's GNP by enabling multinational enterprises to hire local workers. However, if the host country focuses solely on FDI, it may fall into a middle-income trap. Most importantly, we show that if the host country invests in R\&D, its economy can reach sustained growth. In this case, FDI benefits the host country, but only in the early stages of its development process.
通过Kolmogorov-Arnold网络进行非线性因子分解:一种资产收益分析的谱方法
David Breazu
AI总结 本文提出KAN-PCA,一种利用KAN作为编码器和线性映射作为解码器的自编码器,通过在每条边上使用学习的B样条函数替代线性投影,以捕捉比传统PCA更多的方差。实验表明KAN-PCA在20只S&P 500股票上实现了更高的重建R²值,并在修正数据泄露后与PCA外推结果一致。
KAN-PCA是一种自编码器,其编码器使用KAN,解码器使用线性映射。它通过在每条边上使用学习的B样条函数替代线性投影,扩展了传统PCA。动机是捕捉比传统PCA更多的方差,这在市场危机期间线性假设失效时变得效率低下,因为资产之间的相关性剧烈变化。我们证明,如果将样条激活函数强制为线性,KAN-PCA的结果与传统PCA完全相同,从而将PCA确立为特殊情况。在20只S&P 500股票(2015-2024)上的实验表明,KAN-PCA在3个因子下实现了66.57%的重建R²值,比传统PCA的62.99%更高,同时在修正训练过程中的数据泄露后与PCA的外推结果一致。
KAN-PCA is an autoencoder that uses a KAN as encoder and a linear map as decoder. It generalizes classical PCA by replacing linear projections with learned B-spline functions on each edge. The motivation is to capture more variance than classical PCA, which becomes inefficient during market crises when the linear assumption breaks down and correlations between assets change dramatically. We prove that if the spline activations are forced to be linear, KAN-PCA yields exactly the same results as classical PCA, establishing PCA as a special case. Experiments on 20 S&P 500 stocks (2015-2024) show that KAN-PCA achieves a reconstruction R^2 of 66.57%, compared to 62.99% for classical PCA with the same 3 factors, while matching PCA out-of-sample after correcting for data leakage in the training procedure.
通过积分动差生成函数计算动差
Peter Reinhard Hansen, Chen Tong
AI总结 本文提出了一种通用积分框架,用于在满足显式正则条件的情况下,从动差生成函数计算分数、复数、绝对和对数动差。通过沿垂直轮廓评估复数扩展的动差生成函数,获得精确的积分表达式,从而避免了显式概率密度和高阶导数的需要。
我们介绍了一种通用的积分框架,用于在满足显式正则条件的情况下,从动差生成函数(MGF)计算分数、复数、绝对和对数动差。通过沿垂直轮廓评估复数扩展的MGF,我们获得了精确的积分表达式,从而避免了显式概率密度和高阶导数的需要。我们通过对称柯西主值建立了负分数动差的条件,包括分布在中心点处没有点质量的要求。我们通过正态-逆高斯分布和半连续复合泊松-伽马分布的应用,展示了该框架的理论范围和计算实用性。在后者情况下,该框架通过评估条件分数动差来处理边界处的点质量。
We introduce a general integral framework for computing fractional, complex, absolute, and logarithmic moments from the moment-generating function (MGF) under explicit regularity conditions. By evaluating a complex extension of the MGF along a vertical contour, we obtain exact integral expressions that bypass the need for explicit probability densities and high-order derivatives. We establish conditions for negative fractional moments using the symmetric Cauchy principal value, including the requirement that the distribution have no point mass at the centering point. We demonstrate the theoretical scope and computational practicality of the framework through applications to the normal-inverse Gaussian distribution and a semicontinuous compound Poisson-Gamma distribution. In the latter case, the framework handles point masses at the boundary by evaluating conditional fractional moments.
金融市场的长程依赖性:实证证据与生成建模挑战
Yifan He, Svetlozar Rachev
AI总结 本文通过实证研究探讨金融市场中的长程依赖性,并评估深度生成模型在再现此类时间结构方面的能力。研究发现,尽管平均收益的持续性有限,但大多数资产的条件波动性中表现出显著的长记忆特征。研究进一步评估了Quant GANs是否能学习并再现这些简化的时间依赖性,但发现生成序列在捕捉实际数据中观察到的长程依赖性幅度和一致性方面存在不足。
本文提供了一项全面的实证研究,探讨金融市场的长程依赖性(LRD)并评估深度生成模型再现此类时间结构的能力。使用每日数据,涵盖三个代表性领域——股票(S&P 500、DAX、Nikkei 225)、大宗商品(小麦、玉米、大豆)和能源(UNG、USO、XLE)——我们通过三种互补方法检验LRD的存在:重新缩放范围(R/S)分析、去趋势波动分析(DFA)以及带有学生t分布创新的ARFIMA-FIGARCH模型。实证证据表明,尽管平均收益的持续性有限,但大多数资产的条件波动性中普遍表现出显著的长记忆特征。基于这些发现,我们评估了Quant GANs能否学习并再现这些简化的时间依赖性。尽管生成序列成功模仿了厚尾收益率分布和波动率聚类的某些方面,但它们通常无法捕捉实际数据中观察到的长程依赖性的幅度和一致性,尤其是在波动率动态方面。这些结果突显了当前深度生成架构在建模缓慢衰减依赖结构方面的重大局限,并强调了在合成金融数据用于风险管理或长期预测应用时,需要显式纳入长记忆机制的重要性。
This study provides a comprehensive empirical investigation of long-range dependence (LRD) in financial markets and evaluates the ability of deep generative models to reproduce such temporal structures. Using daily data from three representative sectors--equity (S&P 500, DAX, Nikkei 225), commodities (Wheat, Corn, Soybeans), and energy (UNG, USO, XLE)--we examine the presence of LRD through three complementary approaches: rescaled range (R/S) analysis, detrended fluctuation analysis (DFA), and an ARFIMA--FIGARCH model with Student's $t$-distributed innovations. The empirical evidence suggests that while mean returns exhibit limited persistence, pronounced long memory is consistently observed in conditional volatility across most assets. Building on these findings, we assess whether Quant Generative Adversarial Networks (Quant GANs) can learn and reproduce these stylized temporal dependencies. Although the generated series successfully mimic heavy-tailed return distributions and certain aspects of volatility clustering, they generally fail to capture the magnitude and consistency of LRD observed in real data, particularly in volatility dynamics. These results highlight an important limitation of current deep generative architectures in modeling slow-decaying dependence structures and underscore the need for incorporating explicit long-memory mechanisms when synthetic financial data are intended for risk management or long-horizon forecasting applications.
注意力真的全部我们需要吗?对预训练RNN稀疏和全局注意力模型在资产定价中的实证研究
Shanyan Lai
AI总结 本文研究了预训练RNN注意力模型在资产定价中的应用,探讨了注意力机制在捕捉时间依赖性和长期记忆方面的改进,以及在不同市场条件下的稳定性。
本研究探讨了主流注意力机制,如加权注意力、Luong的三种注意力、全局自注意力和滑动窗口稀疏注意力,在顶级420只大型美国股票上的实证资产定价研究。这是首次将大规模最先进的(SOTA)注意力机制应用于资产定价领域。这些模型克服了传统机器学习资产定价方法的局限性,如误捕时间依赖性和短期记忆。此外,注意力机制中的强制因果掩码解决了未来数据泄漏问题,而这一问题被更先进的注意力模型如经典Transformer所忽视。所提出的注意力模型还考虑了资产定价数据的时间稀疏性,并通过部署简化模型结构来缓解潜在的过拟合问题。本文为未来实证经济研究提供了某些见解。所有模型均在三个时期内进行测试,涵盖新冠前、新冠期间和新冠后一年,以测试这些模型在极端市场条件下的稳定性。研究发现,在价值加权投资组合回测中,全局自注意力模型和滑动窗口稀疏注意力模型在获得绝对收益和对冲下行风险方面表现出色,在新冠期间静态交易成本情景下,它们分别实现了2.0和1.80的年化Sortino比率。此外,从绝对投资组合收益的角度来看,滑动窗口稀疏注意力模型在股票市值大小方面比全局自注意力模型表现更加稳定。
This study investigates the pre-trained RNN attention models with the mainstream attention mechanisms, such as additive attention, Luong's three attentions, global self-attention and sliding window sparse attention, for the empirical asset pricing research on the top 420 large-cap US stocks. This is the first paper on the large-scale state-of-the-art (SOTA) attention mechanisms applied in the asset pricing context. They overcome the limitations of the traditional machine learning-based asset pricing, such as mis-capturing the temporal dependency and short memory. Moreover, the enforced causal masks in the attention mechanisms address the future data leaking issue ignored by the more advanced attention-based models, such as the classic Transformer. The proposed attention models also consider the temporal sparsity characteristic of asset pricing data and mitigate potential overfitting issues by deploying the simplified model structures. This provides some insights for future empirical economic research. All models are examined in three periods, which cover pre-COVID-19, COVID-19 and one year post-COVID-19, for testing the stability of these models under extreme market conditions. The study finds that in value-weighted portfolio back testing, the global self-attention model and the sliding window sparse attention model exhibit excellent capabilities in deriving the absolute returns and hedging downside risks, while they achieve an annualized Sortino ratio of 2.0 and 1.80 respectively in the period with COVID-19 in the static transaction cost scenario. Moreover, the sliding window sparse attention model performs more stably than the global self-attention model from the perspective of absolute portfolio returns with respect to the size of stocks' market capitalization.