arXivDaily arXiv每日学术速递 周一至周五更新
2602.06607 2026-02-09 cs.DL cs.CY econ.GN q-fin.EC

Beyond Pairwise Distance: Cognitive Traversal Distance as a Holistic Measure of Scientific Novelty

Yi Xiang, Pascal Welke, Chengzhi Zhang, Jian Wang

详情
英文摘要

Scientific novelty is a critical construct in bibliometrics and is commonly measured by aggregating pairwise distances between the knowledge units underlying a paper. While prior work has refined how such distances are computed, less attention has been paid to how dyadic relations are aggregated to characterize novelty at the paper level. We address this limitation by introducing a network-based indicator, Cognitive Traversal Distance (CTD). Conceptualizing the historical literature as a weighted knowledge network, CTD is defined as the length of the shortest path required to connect all knowledge units associated with a paper. CTD provides a paper-level novelty measure that reflects the minimal structural distance needed to integrate multiple knowledge units, moving beyond mean- or quantile-based aggregation of pairwise distances. Using 27 million biomedical publications indexed by OpenAlex and Medical Subject Headings (MeSH) as standardized knowledge units, we evaluate CTD against expert-based novelty benchmarks from F1000Prime-recommended papers and Nobel Prize-winning publications. CTD consistently outperforms conventional aggregation-based indicators. We further show that MeSH-based CTD is less sensitive to novelty driven by the emergence of entirely new conceptual labels, clarifying its scope relative to recent text-based measures.

2602.06415 2026-02-09 q-fin.MF q-fin.PR stat.ME

Joint survival annuity derivative valuation in the linear-rational Wishart mortality model

Jose Da Fonseca, Patrick Wong

详情
英文摘要

This study proposes a linear-rational joint survival mortality model based on the Wishart process. The Wishart process, which is a stochastic continuous matrix affine process, allows for a general dependency between the mortality intensities that are constructed to be positive. Using the linear-rational framework along with the Wishart process as state variable, we derive a closed-form expression for the joint survival annuity, as well as the guaranteed joint survival annuity option. Exploiting our parameterisation of the Wishart process, we explicit the distribution of the mortality intensities and their dependency. We provide the distribution (density and cumulative distribution) of the joint survival annuity. We also develop some polynomial expansions for the underlying state variable that lead to fast and accurate approximations for the guaranteed joint survival annuity option. These polynomial expansions also significantly simplify the implementation of the model. Overall, the linear-rational Wishart mortality model provides a flexible and unified framework for modelling and managing joint mortality risk.

2602.06401 2026-02-09 q-fin.RM q-fin.MF

Wishart conditional tail risk measures: An analytic approach

Jose Da Fonseca, Patrick Wong

详情
英文摘要

This study introduces a new analytical framework for quantifying multivariate risk measures. Using the Wishart process, which is a stochastic process with values in the space of positive definite matrices, we derive several conditional tail risk measures which, thanks to the remarkable analytical properties of the Wishart process, can be explicitly computed up to a one- or two-dimensional integration. These quantities can also be used to solve analytically a capital allocation problem based on conditional moments. Exploiting the stochastic differential equation property of the Wishart process, we show how an intertemporal (i.e., time-lagged) view of these risk measures can be embedded in the proposed framework. Several numerical examples show that the framework is versatile and operational, thus providing a useful tool for risk management.

2602.06394 2026-02-09 cs.AI cs.CE q-bio.GN q-fin.CP

Unlocking Noisy Real-World Corpora for Foundation Model Pre-Training via Quality-Aware Tokenization

Arvid E. Gollwitzer, Paridhi Latawa, David de Gruijl, Deepak A. Subramanian, Adrián Noriega de la Colina

详情
英文摘要

Current tokenization methods process sequential data without accounting for signal quality, limiting their effectiveness on noisy real-world corpora. We present QA-Token (Quality-Aware Tokenization), which incorporates data reliability directly into vocabulary construction. We make three key contributions: (i) a bilevel optimization formulation that jointly optimizes vocabulary construction and downstream performance, (ii) a reinforcement learning approach that learns merge policies through quality-aware rewards with convergence guarantees, and (iii) an adaptive parameter learning mechanism via Gumbel-Softmax relaxation for end-to-end optimization. Our experimental evaluation demonstrates consistent improvements: genomics (6.7 percentage point F1 gain in variant calling over BPE), finance (30% Sharpe ratio improvement). At foundation scale, we tokenize a pretraining corpus comprising 1.7 trillion base-pairs and achieve state-of-the-art pathogen detection (94.53 MCC) while reducing token count by 15%. We unlock noisy real-world corpora, spanning petabases of genomic sequences and terabytes of financial time series, for foundation model training with zero inference overhead.

2510.00244 2026-02-09 q-fin.GN cs.CY cs.LG

Board gender diversity and emissions performance: Insights from panel regressions, machine learning, and explainable AI

Mohammad Hassan Shakil, Arne Johan Pollestad, Khine Kyaw, Ziaul Haque Munim

Comments 18 pages

Journal ref 10.1016/j.jenvman.2026.128776

详情
英文摘要

With European Union initiatives mandating gender quotas on corporate boards, a key question arises: Is greater board gender diversity (BGD) associated with better emissions performance (EP)? To answer this question, we examine the influence of BGD on EP across a sample of European firms from 2016 to 2022. Using panel regressions, advanced machine learning algorithms, and explainable AI, we reveal a non-linear relationship. Specifically, EP improves with BGD up to an optimal level of approximately 35 %, beyond which further increases in BGD yield no additional improvement in EP. A minimum BGD threshold of 22 % is necessary for meaningful improvements in EP. To assess the legitimacy of EP outcomes, this study examines whether ESG controversies weaken the BGD-EP relationship. The results show no significant effect, suggesting that BGD's impact is driven by governance mechanisms rather than symbolic actions. Additionally, path analysis indicates that while environmental innovation contributes to EP, it is not the mediating channel through which BGD promotes EP. The results have implications for academics, businesses, and regulators.

2502.08548 2026-02-09 econ.GN q-fin.EC

Separating Advertising and Marketplace Functions of E-commerce Platforms: Is it Social Welfare Enhancing?

Zhe Zhang, Young Kwark, Srinivasan Raghunathan, Peng Wang

详情
英文摘要

The use of sponsored product listings in prominent positions of consumer search results has made e-commerce platforms, which traditionally serve as marketplaces for third-party sellers to reach consumers, a major medium for those sellers to advertise their products. On the other hand, regulators have expressed anti-trust concerns about an e-commerce platform's integration of marketplace and advertising functions; they argue that such integration benefits the platform and sellers at the expense of consumers and society and have proposed separating the advertising function from those platforms. We show, contrary to regulators' concerns, that separating the advertising function from the e-commerce platform benefits the sellers, hurts the consumers, and does not necessarily benefit the social welfare. A key driver of our findings is that an independent advertising firm, which relies solely on advertising revenue, has same or lesser economic incentive to improve targeting precision than an e-commerce platform that also serves as the advertising medium, even if both have the same ability to target consumers. This is because an improvement in targeting precision enhances the marketplace commission by softening the price competition between sellers, but hurts the advertising revenue by softening the competition for prominent ad positions.

2501.00382 2026-02-09 econ.GN cs.AI q-fin.EC stat.AP stat.ML

Adventures in Demand Analysis Using AI

Philipp Bach, Victor Chernozhukov, Sven Klaassen, Martin Spindler, Jan Teichert-Kluge, Suhas Vijaykumar

Comments 35 pages, 8 figures

详情
英文摘要

This paper advances empirical demand analysis by integrating multimodal product representations derived from artificial intelligence (AI). Using a detailed dataset of toy cars on textit{Amazon.com}, we combine text descriptions, images, and tabular covariates to represent each product using transformer-based embedding models. These embeddings capture nuanced attributes, such as quality, branding, and visual characteristics, that traditional methods often struggle to summarize. Moreover, we fine-tune these embeddings for causal inference tasks. We show that the resulting embeddings substantially improve the predictive accuracy of sales ranks and prices and that they lead to more credible causal estimates of price elasticity. Notably, we uncover strong heterogeneity in price elasticity driven by these product-specific features. Our findings illustrate that AI-driven representations can enrich and modernize empirical demand analysis. The insights generated may also prove valuable for applied causal inference more broadly.

2403.02572 2026-02-09 q-fin.TR math.PR q-fin.CP

Fill Probabilities in a Limit Order Book with State-Dependent Stochastic Order Flows

Felix Lokin, Fenghui Yu

详情
英文摘要

This paper studies the fill probabilities of limit orders placed at different price levels in a limit order book. These probabilities play a central role in execution optimization, as limit orders are not guaranteed to be executed and inherently involve a trade-off between execution cost and execution risk. We model the limit order book within a general state-dependent stochastic framework, representing its dynamics as a collection of interacting queuing systems while incorporating key stylized market features. Within this framework, we derive semi-analytical expressions for several quantities of interest under state-dependent order flows, including the probability of a mid-price change, the fill probabilities of orders placed at the best quotes, and those of orders placed deeper in the book before the opposite best quote moves. While the framework can be extended to even deeper price levels, the corresponding fill probabilities are typically negligible. We validate the proposed model through extensive numerical experiments using real foreign exchange spot market data. The results demonstrate that the model remains tractable while capturing essential order book dynamics, and that the derived expressions achieve good accuracy in estimating fill probabilities.

2309.14186 2026-02-09 econ.GN q-fin.EC

Value-transforming financial, carbon and biodiversity footprint accounting

S. El Geneidy, M. Peura, V. M. Aumanen, S. Baumeister, U. Helimo, V. Vainio, J. S. Kotiaho

详情
英文摘要

Transformative changes in our production and consumption habits are needed to halt biodiversity loss. Organizations are the way we humans have organized our everyday life, and much of our negative environmental impacts, also called carbon and biodiversity footprints, are caused by organizations. Here we explore how the accounts of any organization can be exploited to develop an integrated carbon and biodiversity footprint account. As a metric we utilize spatially explicit potential global loss of species across all ecosystem types and argue that it can be understood as the biodiversity equivalent. The utility of the biodiversity equivalent for biodiversity could be like what carbon dioxide equivalent is for climate. We provide a global country specific dataset that organizations, experts and researchers can use to assess consumption-based biodiversity footprints. We also argue that the current integration of financial and environmental accounting is superficial, and provide a framework for a more robust financial value-transforming accounting model. To test the methodologies, we utilized a Finnish university as a living lab. Assigning an offsetting cost to the footprints significantly altered the financial value of the organization. We believe such value-transforming accounting is needed to draw the attention of senior executives and investors to the negative environmental impacts of their organizations.

2602.06263 2026-02-09 econ.GN cs.HC cs.SY eess.SY q-fin.EC

Chasing Tails: How Do People Respond to Wait Time Distributions?

Evgeny Kagan, Kyle Hyndman, Andrew Davis

详情
英文摘要

We use a series of pre-registered, incentive-compatible online experiments to investigate how people evaluate and choose among different waiting time distributions. Our main findings are threefold. First, consistent with prior literature, people show an aversion to both longer expected waits and higher variance. Second, and more surprisingly, moment-based utility models fail to capture preferences when distributions have thick-right tails: indeed, decision-makers strongly prefer distributions with long-right tails (where probability mass is more evenly distributed over a larger support set) relative to tails that exhibit a spike near the maximum possible value, even when controlling for mean, variance, and higher moments. Conditional Value at Risk (CVaR) utility models commonly used in portfolio theory predict these choices well. Third, when given a choice, decision-makers overwhelmingly seek information about right-tail outcomes. These results have practical implications for service operations: (1) service designs that create a spike in long waiting times (such as priority or dedicated queue designs) may be particularly aversive; (2) when informativeness is the goal, providers should prioritize sharing right-tail probabilities or percentiles; and (3) to increase service uptake, providers can strategically disclose (or withhold) distributional information depending on right-tail shape.

2602.06198 2026-02-09 q-fin.ST q-fin.TR

Insider Purchase Signals in Microcap Equities: Gradient Boosting Detection of Abnormal Returns

Hangyi Zhao

Comments 9 pages, 4 figures, 4 tables

详情
英文摘要

This paper examines whether SEC Form 4 insider purchase filings predict abnormal returns in U.S. microcap stocks. The analysis covers 17,237 open-market purchases across 1,343 issuers from 2018 through 2024, restricted to market capitalizations between \$30M and \$500M. A gradient boosting classifier trained on insider identity, transaction history, and market conditions at disclosure achieves AUC of 0.70 on out-of-sample 2024 data. At an optimized threshold of 0.20, precision is 0.38 and recall is 0.69. The distance from the 52-week high dominates feature importance, accounting for 36% of predictive signal. A momentum pattern emerges in the data: transactions disclosed after price appreciation exceeding 10% yield the highest mean cumulative abnormal return (6.3%) and the highest probability of outperformance (36.7%). This contrasts with the simple mean-reversion intuition often applied to post-run-up entries. The result is robust to winsorization and holds across subsamples. These patterns are consistent with slower information incorporation in illiquid markets, where trend confirmation may filter for higher-conviction insider signals.

2512.12783 2026-02-09 cs.LG q-fin.ST stat.AP

Credit Risk Estimation with Non-Financial Features: Evidence from a Synthetic Istanbul Dataset

Atalay Denknalbant, Emre Sezdi, Zeki Furkan Kutlu

详情
英文摘要

Financial exclusion constrains entrepreneurship, increases income volatility, and widens wealth gaps. Underbanked consumers in Istanbul often have no bureau file because their earnings and payments flow through informal channels. To study how such borrowers can be evaluated we create a synthetic dataset of one hundred thousand Istanbul residents that reproduces first quarter 2025 TÜİK (TURKSTAT) census marginals and telecom usage patterns. Retrieval augmented generation feeds these public statistics into the OpenAI o3 model, which synthesises realistic yet private records. Each profile contains seven socio demographic variables and nine alternative attributes that describe phone specifications, online shopping rhythm, subscription spend, car ownership, monthly rent, and a credit card flag. To test the impact of the alternative financial data CatBoost, LightGBM, and XGBoost are each trained in two versions. Demo models use only the socio demographic variables; Full models include both socio demographic and alternative attributes. Across five fold stratified validation the alternative block raises area under the curve by about one point three percentage and lifts balanced F 1 from roughly 0.84 to 0.95, a fourteen percent gain. We contribute an open Istanbul 2025 Q1 synthetic dataset, a fully reproducible modeling pipeline, and empirical evidence that a concise set of behavioural attributes can approach bureau level discrimination power while serving borrowers who lack formal credit records. These findings give lenders and regulators a transparent blueprint for extending fair and safe credit access to the underbanked.