arXivDaily arXiv每日学术速递 周一至周五更新
重置
q-fin.CP计算金融4
2606.11798 2026-06-11 q-fin.CP cs.LG math.OC 新提交

Deterministic Policy Gradient for Learning Equilibrium in Time-Inconsistent Control Problems

时间不一致控制问题中学习均衡的确定性策略梯度

Xin Guo, Yijie Huang, Xiang Yu

AI总结 提出一种连续时间无模型强化学习算法,通过确定性策略梯度和内定点迭代学习时间不一致控制问题的均衡策略,并在均值-方差投资组合和非指数贴现跟踪投资组合中验证有效性。

详情
Comments
Keywords: Time-inconsistent control, two-stage reformulation, model-free continuous-time reinforcement learning, deterministic policy gradient, fixed point iteration
AI中文摘要

在本文中,我们开发了一种连续时间无模型强化学习算法,用于学习一般时间不一致控制问题中的确定性均衡策略。利用扩展的Hamilton-Jacobi-Bellman系统,我们将原始时间不一致问题转化为一个等价的两阶段问题。在第一阶段,对于给定的辅助函数,我们采用确定性策略梯度方法在辅助的时间一致控制问题中学习最优策略。在第二阶段,给定更新后的策略,我们利用内定点迭代和某些鞅特征来学习辅助函数。作为理论贡献,我们提供了一些温和的模型假设,并建立了内定点迭代的收敛性。通过在两阶段之间重复这种演员-评论家风格的迭代,我们的算法旨在以统一的方式学习不同时间不一致性来源下的均衡。该算法在两种经典的时间不一致金融应用中的优越有效性得到了说明:均值-方差投资组合管理和非指数贴现下的最优跟踪投资组合。

英文摘要

In this paper, we develop a continuous-time model-free reinforcement learning algorithm to learn deterministic equilibrium policies in general time-inconsistent control problems. Utilizing the extended Hamilton-Jacobi-Bellman system, we recast the original time-inconsistent problem into an equivalent two-stage problem. In the first stage, for given auxiliary functions, we employ the deterministic policy gradient approach to learn an optimal policy in an auxiliary time-consistent control problem. In the second stage, given the updated policy, we exploit the inner fixed point iterations and some martingale characterizations to learn the auxiliary functions. As a theoretical contribution, we provide some mild model assumptions and establish the convergence of inner fixed point iterations. By repeating this actor-critic style of iterations across two stages, our algorithm aims to learn the equilibrium under different sources of time-inconsistency in a unified manner. The superior effectiveness of the proposed algorithm are illustrated in two classical financial applications with time-inconsistency: mean-variance portfolio management and optimal tracking portfolio under non-exponential discounting.

2606.11223 2026-06-11 q-fin.CP cs.FL 新提交

Scenario Constraints with Memory: A Finite-State Approach to Quantitative Financial Analysis

带记忆的场景约束:一种面向定量金融分析的有限状态方法

Vitaly Nürnberg

AI总结 提出基于事件历史自动机(EHA)和加权金融有限自动机(WFFA)的定量框架,通过同步乘积计算极端收益边界,并提取可解释的见证事件序列,用于金融系统的精确极值分析。

详情
AI中文摘要

在复杂市场场景下量化最坏和最佳性能是金融风险管理及路径依赖金融工具(如奇异期权和结构化产品)验证中的持续挑战。基于模拟的方法适用于概率估计,但无法直接对所有可行场景提供穷尽保证或显式给出极端结果的见证。为解决这一问题,我们引入了一种基于定量自动机的框架,用于在声明性场景约束下对金融系统进行精确极值分析。该框架的核心是事件历史自动机(EHAs),一种新的形式化模型,它将正则表达式事件模式与可行数值区间相结合,以表示带记忆的受约束事件历史。定量收益由加权金融有限自动机(WFFAs)表示,其转移权重依赖于观测到的市场价值。通过计算EHAs和WFFAs的同步乘积,我们的框架能够精确计算收益的上界和下界。此外,该方法自动提取可解释的见证事件历史,这些历史实现了这些极端结果。我们通过一个具有路径依赖机制的自动赎回结构化产品的案例研究,展示了该方法的实际可行性。该案例研究分析了不同场景约束如何影响票息累积、提前赎回和保护损失结果。可扩展性实验表明,对于实际合同期限和非平凡约束配置,该框架的执行在计算上是可行的。总体而言,该方法为标准金融模拟方法提供了数学上严格的补充。

英文摘要

Quantifying worst-case and best-case performance under complex market scenarios is a persistent challenge in financial risk management and the verification of path-dependent financial instruments, such as exotic options and structured products. Simulation-based methods are well suited for probabilistic estimation, but they do not directly provide exhaustive guarantees over all admissible scenarios or explicit witnesses for extremal outcomes. To address this, we introduce a quantitative automata-based framework for the exact extremal analysis of financial systems under declarative scenario constraints. At the core of our approach are event history automata (EHAs), a new formal model that integrates regular-expression event patterns with admissible numerical intervals to represent constrained event histories with memory. Quantitative payoffs are represented by weighted finance finite automata (WFFAs), which allow transition weights to depend on observed market values. By computing the synchronized product of EHAs and WFFAs, our framework enables the exact calculation of upper and lower payoff bounds. Furthermore, the method automatically extracts interpretable witness event histories that realize these extremal outcomes. We demonstrate the practical viability of the approach through a case study of an autocallable structured product with path-dependent mechanisms. The case study analyzes how different scenario constraints affect coupon accumulation, early redemption, and protection-loss outcomes. Scalability experiments indicate that the framework's execution remains computationally feasible for practical contract horizons and nontrivial constraint configurations. Overall, this approach provides a mathematically rigorous complement to standard financial simulation methods.

2605.18343 2026-06-11 q-fin.CP q-fin.PR 版本更新

Explicit Rational Formulae for Bachelier (Normal) Implied Volatility

Bachelier(正态)隐含波动率的显式有理公式

Fabien Le Floc'h

AI总结 提出两个无需迭代的显式有理公式,通过期权价格、远期、行权价和到期时间直接计算Bachelier隐含波动率,精度接近机器精度。

详情
AI中文摘要

我们提出了两个用于Bachelier(或正态)隐含波动率的显式有理公式。这些公式以期权价格、远期、行权价和到期时间为输入,无需迭代即可返回隐含正态波动率。它们遵循LFK-4的分支结构,但在近价区域使用了更简单的变量,即远期-行权价绝对差除以尾部时间价值,避免了该区域的对数和小参数泰勒分支。LFK-2026是面向精度的公式,在远尾区域直接近似倒数绝对标准化货币度。LFK-2026C保持相同的平移虚值有理尾近似,但将近价分支拆分为一个非常小的低u有理分支和一个中程有理分支。在双精度测试中,两者均保持接近机器精度,而LFK-2026C在当前基准混合上是更快的标量实现。

英文摘要

We present two explicit rational formulae for Bachelier, or normal, implied volatility. The formulae take the option price, forward, strike, and expiry as inputs and return the implied normal volatility without iteration. They follow the branch structure of LFK-4, but use the simpler near-the-money variable given by the absolute forward-strike difference divided by the tail time value, avoiding a logarithm and a small-argument Taylor branch in that region. LFK-2026 is the accuracy-oriented formula and approximates reciprocal absolute standardized moneyness directly in the far tail. LFK-2026C keeps the same shifted out-of-the-money rational tail approximation, but splits the near-the-money branch two low degree rationals. In double precision tests both remain close to machine accuracy, while LFK-2026C is the faster scalar implementation on the current benchmark mix.

2603.19225 2026-06-11 cs.CE cs.AI cs.CL cs.IR q-fin.CP 版本更新

FinTradeBench: A Financial Reasoning Benchmark for LLMs

FinTradeBench: 面向LLM的金融推理基准

Yogesh Agrawal, Aniruddha Dutta, Md Mahadi Hasan, Santu Karmaker, Aritra Dutta

AI总结 提出FinTradeBench基准,通过结合公司基本面与交易信号,评估大语言模型在金融推理中的表现,发现检索增强对数值和时间序列推理帮助有限。

详情
Comments
9 pages main text, 31 pages total (including references and appendix). 5 figures, 16 tables. Preprint under review. Code and data will be made available upon publication
AI中文摘要

现实世界的金融决策是一个具有挑战性的问题,需要对异构信号进行推理,包括从监管文件中提取的公司基本面和从价格动态计算出的交易信号。最近,随着大语言模型(LLM)的进步,金融分析师开始将它们用于金融决策任务。然而,现有的用于测试这些模型的金融问答基准主要关注公司资产负债表数据,很少评估关于公司股票如何在市场中交易或它们与基本面相互作用的推理。为了利用这两种方法的优势,我们引入了FinTradeBench,这是一个评估金融推理的基准,它整合了公司基本面和交易信号。FinTradeBench包含1400个问题,这些问题基于纳斯达克-100公司十年历史窗口的数据。该基准分为三个推理类别:基本面聚焦、交易信号聚焦以及需要跨信号推理的混合问题。为了确保大规模可靠性,我们采用了一个校准然后扩展的框架,该框架结合了专家种子问题、多模型响应生成、模型内自过滤、数值审计以及人类-LLM判断对齐。我们在零样本提示和检索增强设置下评估了14个LLM,并观察到了明显的性能差距。检索显著改善了对文本基本面的推理,但对交易信号推理的益处有限。这些发现突显了当前LLM在数值和时间序列推理方面的根本性挑战,并激励了未来在金融智能方面的研究。

英文摘要

Real-world financial decision-making is a challenging problem that requires reasoning over heterogeneous signals, including company fundamentals derived from regulatory filings and trading signals computed from price dynamics. Recently, with advances in Large Language Models (LLMs), financial analysts have begun to use them for financial decision-making tasks. However, existing financial question-answering benchmarks for testing these models primarily focus on company balance sheet data and rarely evaluate reasoning about how company stocks trade in the market or their interactions with fundamentals. To leverage the strengths of both approaches, we introduce FinTradeBench, a benchmark for evaluating financial reasoning that integrates company fundamentals and trading signals. FinTradeBench contains 1,400 questions grounded in NASDAQ-100 companies over a ten-year historical window. The benchmark is organized into three reasoning categories: fundamentals-focused, trading-signal-focused, and hybrid questions requiring cross-signal reasoning. To ensure reliability at scale, we adopt a calibration-then-scaling framework that combines expert seed questions, multi-model response generation, intra-model self-filtering, numerical auditing, and human-LLM judge alignment. We evaluate 14 LLMs under zero-shot prompting and retrieval-augmented settings and witness a clear performance gap. Retrieval substantially improves reasoning over textual fundamentals, but provides limited benefit for trading-signal reasoning. These findings highlight fundamental challenges in the numerical and time-series reasoning for current LLMs and motivate future research in financial intelligence.