arXivDaily arXiv每日学术速递 周一至周五更新
重置
2603.08956 2026-03-25 econ.GN cs.LG q-fin.EC

A Survey of Reinforcement Learning For Economics

Pranjal Rawat

详情
英文摘要

This survey (re)introduces reinforcement learning methods to economists. The curse of dimensionality limits how far exact dynamic programming can be effectively applied, forcing us to rely on suitably "small" problems or our ability to convert "big" problems into smaller ones. While this reduction has been sufficient for many classical applications, a growing class of economic models resists such reduction. Reinforcement learning algorithms offer a natural, sample-based extension of dynamic programming, extending tractability to problems with high-dimensional states, continuous actions, and strategic interactions. I review the theory connecting classical planning to modern learning algorithms and demonstrate their mechanics through simulated examples in pricing, inventory control, strategic games, and preference elicitation. I also examine the practical vulnerabilities of these algorithms, noting their brittleness, sample inefficiency, sensitivity to hyperparameters, and the absence of global convergence guarantees outside of tabular settings. The successes of reinforcement learning remain strictly bounded by these constraints, as well as a reliance on accurate simulators. When guided by economic structure, reinforcement learning provides a remarkably flexible framework. It stands as an imperfect, but promising, addition to the computational economist's toolkit. A companion survey (Rust and Rawat, 2026b) covers the inverse problem of inferring preferences from observed behavior. All simulation code is publicly available.

2510.23421 2026-03-25 econ.GN cs.AI q-fin.EC

Quantifying Systemic Vulnerability in the Foundation Model Industry

Claudio Pirrone, Stefano Fricano, Gioacchino Fazio

Comments Conference Paper - SIEPI (29-30 January 2026) - Bari

详情
英文摘要

The foundation model industry exhibits unprecedented concentration in critical inputs: semiconductors, energy infrastructure, elite talent, capital, and training data. Despite extensive sectoral analyses, no comprehensive framework exists for assessing overall industrial vulnerability. We develop the Artificial Intelligence Industrial Vulnerability Index (AIIVI) grounded in O-Ring production theory, recognizing that foundation model production requires simultaneous availability of non-substitutable inputs. Given extreme data opacity and rapid technological evolution, we implement a validated human-in-the-loop methodology using large language models to systematically extract indicators from dispersed grey literature, with complete human verification of all outputs. Applied to six state-of-the-art foundation model developers, AIIVI equals 0.82, indicating extreme vulnerability driven by compute infrastructure (0.85) and energy systems (0.90). While industrial policy currently emphasizes semiconductor capacity, energy infrastructure represents the emerging binding constraint. This methodology proves applicable to other fast-evolving, opaque industries where traditional data sources are inadequate.

2305.11523 2026-03-25 econ.GN q-fin.EC

AI Regulation in the European Union: Examining Non-State Actor Preferences

Jonas Tallberg, Magnus Lundgren, Johannes Geith

详情
Journal ref
Bus. Polit. 26 (2024) 218-239
英文摘要

As the development and use of artificial intelligence (AI) continues to grow, policymakers are increasingly grappling with the question of how to regulate this technology. The most far-reaching international initiative is the European Union (EU) AI Act, which aims to establish the first comprehensive, binding framework for regulating AI. In this article, we offer the first systematic analysis of non-state actor preferences toward international regulation of AI, focusing on the case of the EU AI Act. Theoretically, we develop an argument about the regulatory preferences of business actors and other non-state actors under varying conditions of AI sector competitiveness. Empirically, we test these expectations using data from public consultations on European AI regulation. Our findings are threefold. First, all types of non-state actors express concerns about AI and support regulation in some form. Second, there are nonetheless significant differences across actor types, with business actors being less concerned about the downsides of AI and more in favor of lax regulation than other non-state actors. Third, these differences are more pronounced in countries with stronger commercial AI sectors. Our findings shed new light on non-state actor preferences toward AI regulation and point to challenges for policymakers balancing competing interests in society.

2603.23300 2026-03-25 q-fin.PM cs.AI cs.MA q-fin.ST

Designing Agentic AI-Based Screening for Portfolio Investment

Mehmet Caner, Agostino Capponi, Nathan Sun, Jonathan Y. Tan

详情
英文摘要

We introduce a new agentic artificial intelligence (AI) platform for portfolio management. Our architecture consists of three layers. First, two large language model (LLM) agents are assigned specialized tasks: one agent screens for firms with desirable fundamentals, while a sentiment analysis agent screens for firms with desirable news. Second, these agents deliberate to generate and agree upon buy and sell signals from a large portfolio, substantially narrowing the pool of candidate assets. Finally, we apply a high-dimensional precision matrix estimation procedure to determine optimal portfolio weights. A defining theoretical feature of our framework is that the number of assets in the portfolio is itself a random variable, realized through the screening process. We introduce the concept of sensible screening and establish that, under mild screening errors, the squared Sharpe ratio of the screened portfolio consistently estimates its target. Empirically, our method achieves superior Sharpe ratios relative to an unscreened baseline portfolio and to conventional screening approaches, evaluated on S&P 500 data over the period 2020--2024.

2603.23024 2026-03-25 econ.GN q-fin.EC

Heart Failure's First Shock and Nurse-Led Chronic Care

Moslem Rashidi, Luke B. Connelly, Gianluca Fiorentini

详情
英文摘要

We study how a first heart-failure hospitalization, an adverse health shock, changes patients' care, and whether a nurse-led chronic-care program sustains those post-shock investments. Using linked population-wide administrative records from Italy's Romagna Local Health Authority (2017-2023), we anchor event time at each patient's first CHF admission and exploit staggered timing to estimate dynamic effects. The shock triggers a sharp post-discharge surge: beta-blocker adherence, cardiology follow-up, and echocardiography rise immediately, while emergency-room use spikes just before admission and then stabilizes. We then estimate the incremental impact of enrollment in the Nurse-led Program for Chronic Patients (NPCP) using the interaction-weighted event-study estimator for staggered adoption. Under conventional difference-in-differences inference, NPCP strengthens long-run preventive engagement, with little detectable change in emergency-room use. HonestDiD sensitivity analysis indicates these gains are economically meaningful but not statistically definitive under modest departures from parallel trends.

2603.22886 2026-03-25 cs.LG q-fin.GN q-fin.ST

Conditionally Identifiable Latent Representation for Multivariate Time Series with Structural Dynamics

Minkey Chang, Jae-Young Kim

Comments Accepted paper for 2026 ICLR FINAI workshop

详情
英文摘要

We propose the Identifiable Variational Dynamic Factor Model (iVDFM), which learns latent factors from multivariate time series with identifiability guarantees. By applying iVAE-style conditioning to the innovation process driving the dynamics rather than to the latent states, we show that factors are identifiable up to permutation and component-wise affine (or monotone invertible) transformations. Linear diagonal dynamics preserve this identifiability and admit scalable computation via companion-matrix and Krylov methods. We demonstrate improved factor recovery on synthetic data, stable intervention accuracy on synthetic SCMs, and competitive probabilistic forecasting on real-world benchmarks.

2603.22880 2026-03-25 q-fin.GN cs.CE q-fin.PM

Portfolio Optimization under Recursive Utility via Reinforcement Learning

Minkey Chang

详情
英文摘要

We study whether a risk-sensitive objective from asset-pricing theory -- recursive utility -- improves reinforcement learning for portfolio allocation. The Bellman equation under recursive utility involves a certainty equivalent (CE) of future value that has no closed form under observed returns; we approximate it by $K$-sample Monte Carlo and train actor-critic (PPO, A2C) on the resulting value target and an approximate advantage estimate (AAE) that generalizes the Bellman residual to multi-step with state-dependent weights. This formulation applies only to critic-based algorithms. On 10 chronological train/test splits of South Korean ETF data, the recursive-utility agent improves on the discounted (naive) baseline in Sharpe ratio, max drawdown, and cumulative return. Derivations, world model and metrics, and full result tables are in the appendices.

2603.22831 2026-03-25 cs.CE q-fin.MF

Option pricing model under the G-expectation framework

Ziting Pei, Xingye Yue, Xiaotao Zheng

详情
英文摘要

G-expectation, as a sublinear expectation, provides a powerful framework for modeling uncertainty in financial markets. Motivated by the need for robust valuation under model uncertainty, this work develops a unified risk-neutral valuation approach within the G-expectation environment, yielding a nonlinear generalization of the Black-Scholes model, termed the G-Black-Scholes equation. To enhance computational efficiency and reduce numerical cost, we introduce a logarithmic transformation of the asset price, which yields an alternative nonlinear PDE. Based on this transformed formulation, we design both explicit and implicit finite difference schemes that are rigorously demonstrated to be consistent, stable, monotone, and convergent to the viscosity solution. Numerical examples confirm that the proposed schemes achieve high accuracy, while the logarithmic transformation relaxes the stability constraints of explicit schemes and improves computational efficiency.

2603.22569 2026-03-25 q-fin.RM stat.ME

Proxy-Reliance Control in Conformal Recalibration of One-Sided Value-at-Risk

Tenghan Zhong

Comments 44 pages, 4 figures, 9 tables, appendix included

详情
英文摘要

We introduce a proxy-reliance-controlled conformal recalibration framework for one-sided Value-at-Risk (VaR), and study a question that existing state-aware methods do not usually isolate: how strongly should the recalibration adjustment depend on an imperfect volatility proxy? We formalize this through a proxy-reliance parameter that continuously interpolates between an approximately constant-shift correction and a fully proxy-scaled correction. This makes proxy reliance a distinct and practically interpretable design choice in one-sided VaR recalibration. We show theoretically that larger proxy reliance increases the responsiveness of the tail adjustment to proxy scale, but also increases stressed-state fragility when the proxy underreacts. Empirically, in rolling out-of-sample tests on a six-ETF panel with VIX-linked state variables, and with supporting evidence from SPY, we find that the empirical value of proxy-reliance control lies in improved stressed-state robustness rather than uniform overall dominance. In particular, when the baseline forecast remains exposed to proxy imperfection in stressed states, lower or intermediate proxy reliance can outperform fully proxy-scaled recalibration in stressed left-tail VaR control.

2602.07023 2026-03-25 q-fin.TR cs.AI

Behavioral Consistency Validation for LLM Agents: An Analysis of Trading-Style Switching through Stock-Market Simulation

Zeping Li, Guancheng Wan, Keyang Chen, Yu Chen, Yiwen Zhao, Philip Torr, Guangnan Ye, Zhenfei Yin, Hongfeng Chai

详情
英文摘要

Recent works have increasingly applied Large Language Models (LLMs) as agents in financial stock market simulations to test if micro-level behaviors aggregate into macro-level phenomena. However, a crucial question arises: Do LLM agents' behaviors align with real market participants? This alignment is key to the validity of simulation results. To explore this, we select a financial stock market scenario to test behavioral consistency. Investors are typically classified as fundamental or technical traders, but most simulations fix strategies at initialization, failing to reflect real-world trading dynamics. In this work, we assess whether agents' strategy switching aligns with financial theory, providing a framework for this evaluation. We operationalize four behavioral-finance drivers-loss aversion, herding, wealth differentiation, and price misalignment-as personality traits set via prompting and stored long-term. In year-long simulations, agents process daily price-volume data, trade under a designated style, and reassess their strategy every 10 trading days. We introduce four alignment metrics and use Mann-Whitney U tests to compare agents' style-switching behavior with financial theory. Our results show that recent LLMs' switching behavior is only partially consistent with behavioral-finance theories, highlighting the need for further refinement in aligning agent behavior with financial theory.

2512.06033 2026-03-25 cs.CR econ.GN q-fin.EC

Sell Data to AI Algorithms Without Revealing It: Secure Data Valuation and Sharing via Homomorphic Encryption

Michael Yang, Ruijiang Gao, Zhiqiang Zheng

详情
英文摘要

The rapid expansion of Artificial Intelligence is hindered by a fundamental friction in data markets: the value-privacy dilemma, where buyers cannot verify a dataset's utility without inspection, yet inspection may expose the data (Arrow's Information Paradox). We resolve this challenge by introducing the Trustworthy Influence Protocol (TIP), a privacy-preserving framework that enables prospective buyers to quantify the utility of external data without ever decrypting the raw assets. By integrating Homomorphic Encryption with gradient-based influence functions, our approach allows for the precise, blinded scoring of data points against a buyer's specific AI model. To ensure scalability for Large Language Models (LLMs), we employ low-rank gradient projections that reduce computational overhead while maintaining near-perfect fidelity to plaintext baselines, as demonstrated across BERT and GPT-2 architectures. Empirical simulations in healthcare and generative AI domains validate the framework's economic potential: we show that encrypted valuation signals achieve a high correlation with realized clinical utility and reveal a heavy-tailed distribution of data value in pre-training corpora where a minority of texts drive capability while the majority degrades it. These findings challenge prevailing flat-rate compensation models and offer a scalable technical foundation for a meritocratic, secure data economy.

2511.19186 2026-03-25 q-fin.PM

Carbon-Penalised Portfolio Insurance Strategies in a Stochastic Factor Model with Partial Information

Katia Colaneri, Federico D'Amario, Daniele Mancinelli

详情
英文摘要

Given the increasing importance of environmental, social and governance (ESG) factors, particularly carbon emissions, we investigate optimal proportional portfolio insurance (PPI) strategies accounting for carbon footprint reduction. PPI strategies enable investors to mitigate downside risk while retaining the potential for upside gains. This paper aims to determine the multiplier of the PPI strategy to maximise the expected utility of the terminal cushion, where the terminal cushion is penalised proportionally to the realised volatility of stocks issued by firms operating in carbon-intensive sectors. We model the risky assets' dynamics using geometric Brownian motions whose drift rates are modulated by an unobservable common stochastic factor to capture market-specific or economy-wide state variables that are typically not directly observable. Using classical stochastic filtering theory, we formulate a suitable optimization problem and solve it for CRRA utility function. We characterise optimal carbon penalised PPI strategies and optimal value functions under full and partial information and quantify the loss of utility due incomplete information. Finally, we carry a numerical analysis showing that the proposed strategy reduces carbon emission intensity without compromising financial performance.