arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2077
2303.04754 2026-05-11 stat.ME stat.CO

Estimation of Long-Range Dependent Models with Missing Data: to Impute or not to Impute?

Guilherme Pumi, Gladys Choque Ulloa, Taiane Schaedler Prass

AI总结 本文研究了在存在缺失数据的情况下,如何估计长记忆时间序列模型ARFIMA$(p,d,q)$中的长程依赖参数$d$。文章比较了两种主要方法:一种是先对缺失数据进行插补再进行估计,另一种是直接设计适用于缺失数据的估计方法。通过大量蒙特卡洛模拟实验,作者在不同缺失比例和依赖程度下对35种方法进行了系统比较,为实际应用提供了参考依据。

详情
英文摘要

Among the most important models for long-range dependent time series is the class of ARFIMA$(p,d,q)$ (Autoregressive Fractionally Integrated Moving Average) models. Estimating the long-range dependence parameter $d$ in ARFIMA models is a well-studied problem, but the literature regarding the estimation of $d$ in the presence of missing data is very sparse. There are two basic approaches to dealing with the problem: missing data can be imputed using some plausible method, and then the estimation can proceed as if no data were missing, or we can use a specially tailored methodology to estimate $d$ in the presence of missing data. In this work, we review some of the methods available for both approaches and compare them through a Monte Carlo simulation study. We present a comparison among 35 different setups to estimate $d$, under tenths of different scenarios, considering percentages of missing data ranging from as few as 10\% up to 70\% and several levels of dependence.

2301.06196 2026-05-11 cs.DL

Young Male and Female Scientists: A Quantitative Exploratory Study of the Changing Demographics of the Global Scientific Workforce

Marek Kwiek, Lukasz Szymula

AI总结 本研究通过大规模、代际、横断面和纵向分析方法,探讨了全球科研人员的性别与年龄分布变化,涵盖1990年至2021年间来自38个OECD国家的430万名非兼职科学家。研究聚焦于16个STEMM学科中年轻男女科学家的比例变化,发现部分学科女性已占多数,且不同学科变化速度不一,有三分之一学科的最年轻女性科学家数量已超过男性。研究还揭示了女性科学家主要集中在医学领域,并指出全球文献计量数据在分析科研人员性别、年龄、学科和时间分布方面的应用价值与局限性。

Comments 40 pages, 7 tables, 12 figures

详情
英文摘要

In this study, the global scientific workforce is explored through large-scale, generational, cross-sectional, and longitudinal approaches. We examine 4.3 million nonoccasional scientists from 38 OECD countries publishing in 1990-2021. Our interest is in the changing distribution of young male and female scientists over time across 16 STEMM (science, technology, engineering, mathematics, medicine) disciplines. We unpack the details of the changing scientific workforce using age groups. Some disciplines are already numerically dominated by women, and the change is fast in some and slow in other disciplines. In one-third of disciplines, there are already more youngest female than male scientists. Across all disciplines combined, the majority of women are young women. And more than half of women scientists (55.02%) are located in medicine. The usefulness of global bibliometric data sources in analyzing the scientific workforce along gender, age, discipline, and time is tested. Traditional aggregated data about scientists in general hide a nuanced picture of the changing gender dynamics within and across disciplines and age groups. The limitations of bibliometric datasets are explored, and global studies are compared with national-level studies. The methodological choices and their implications are shown, and new opportunities for how to study scientists globally are discussed.

2212.07384 2026-05-11 econ.GN q-fin.EC

Valuing Pharmaceutical Drug Innovations

Gaurab Aryal, Federico Ciliberto, Leland E. Farmer, Ekaterina Khmelnitskaya

AI总结 本文提出了一种估算制药药物市场价值的方法,结合事件研究法与贴现现金流模型,通过分析药物研发公告对股市的反应来推断药物价值。研究估计小型企业开发的药物平均市场价值约为21.6亿美元,临床前阶段的风险调整后现值约为5000万美元,并估算药物研发初期的平均成本约为3800万美元。研究还针对不同治疗领域进行了价值与成本估算,并探讨了如何利用这些结果制定支持药物研发的政策。

详情
英文摘要

We propose a methodology to estimate the market value of pharmaceutical drugs. Our approach combines the event study method with a discounted cash flow model that infers drug values from stock market responses to drug development announcements. We estimate the average value of a drug developed by small firms (those below the 95th percentile of market capitalization) to be \$2.16 billion. At the preclinical stage, the risk-adjusted and present discounted average net value of drugs is \$50 million. Leveraging these estimates, we also determine the expected drug development cost at the start of the discovery stage to be \$38 million. We estimate values and costs for several therapeutic areas (e.g., neoplasm, infections) and explore applying these estimates to design policies that support drug development through drug buyouts and targeted preclinical interventions.

2201.07799 2026-05-11 math.GM

A Minimum Doubly Resolving Set and Strong Resolving Set for the Crystal Cubic Carbon

Ali Zafari, Saeid Alikhani

AI总结 本文研究了晶体立方碳结构 $CCC(n)$ 的最小双分辨集和强分辨集的大小问题。作者提出了一种替代的结构表示方法,并基于该模型确定了 $CCC(n)$ 的最小双分辨集和强分辨集的规模。该研究为解决这类NP难问题提供了新的理论依据和计算方法。

Comments Personal reasons, Professor Jia Bao Liu asked us not to mention his name in the article and to thank him only in the acknowledgments section

详情
英文摘要

The task of identifying resolving sets has been extensively studied due to its wide relevance in fields such as chemistry, robot navigation, combinatorial optimization, pattern recognition, and image processing. These applications have helped motivate and establish the theoretical foundations of the subject. Notably, problems of this type are generally known to be NP-hard. This study introduces an alternative structural representation for the crystal cubic carbon \( CCC(n) \). Building on this model, we determine the minimum sizes of both a doubly resolving set and a strong resolving set for $CCC(n)$.

2110.13814 2026-05-11 econ.GN cs.GT q-fin.EC

Bidders' Responses to Auction Format Change in Internet Display Advertising Auctions

Shumpei Goke, Gabriel Y. Weintraub, Ralph Mastromonaco, Sam Seljan

AI总结 本文研究了互联网展示广告拍卖中,当新的拍卖格式(如从二价拍卖改为一价拍卖)引入市场时,投标人的实际竞价行为变化。通过分析不同出版商分阶段采用一价拍卖的新型数据集,研究发现,采用新格式的出版商相比未采用的出版商,每千次展示的广告价格显著上升,增幅达原价格的25%至75%。然而,随着时间推移,这种价格增长逐渐减弱,表明投标人在初期未充分调整出价策略,最终趋向于逐步适应新格式的均衡状态。该研究为拍卖格式变更对投标人行为的影响提供了首个实证分析,对拍卖设计具有重要参考价值。

Comments 35 pages, 37 figures

详情
英文摘要

We study actual bidding behavior when a new auction format gets introduced into the marketplace. More specifically, we investigate this question using a novel dataset on internet display advertising auctions that exploits a staggered adoption by different publishers (sellers) of first-price auctions (FPAs), instead of the traditional second-price auctions (SPAs). We analyze the auction format change using difference-in-differences regressions and a synthetic difference-in-differences estimator, which better handles pre-trends. The results show that revenue per sold impression (price) jumps considerably for treated publishers relative to control publishers, with increases ranging from 25% to 75% of the pre-treatment price level of the treated group. Moreover, for later auction format changes, the increase in price levels under FPAs relative to those under SPAs tends to dissipate over time, reminiscent of the revenue equivalence theorem, although the extent of this reversion depends on the specification. We view these results as suggestive of initially insufficient bid shading following the format change, as opposed to an immediate transition to a new Bayesian Nash equilibrium, with prices tending to decline in several specifications in a manner consistent with gradual adjustment in bidding behavior as bidders learn to shade their bids. Our work constitutes one of the first field studies on bidders'responses to auction format changes, providing an important complement to theoretical model predictions. As such, it provides valuable information to auction designers when considering the implementation of different formats.

1907.00347 2026-05-11 math.DS

Geometric conditions for matrix domination in two dimensions

Argyrios Christodoulou

AI总结 本文研究了特殊线性群中有限子集被支配的几何条件,提出了必要且充分的判定条件,这些条件仅涉及矩阵的迹和特征向量,具有明确的计算方式。研究还提供了一种简单算法,可根据指定的特征向量构造被支配的集合,所用方法结合了被支配集与二维双曲几何之间的关系。

Comments 22 pages, to appear in Discrete Contin. Dyn. Syst

详情
英文摘要

In this article we prove a necessary and a sufficient condition for a finite subset of the special linear group to be dominated. These conditions are purely geometric in nature, as they only involve the trace and the eigenvectors of the matrices, and can be computed explicitly. Our sufficient condition, in particular, provides a simple algorithm for constructing a dominated set with prescribed eigenvectors. The techniques involved in our proofs take advantage of the interaction between dominated sets and two-dimensional hyperbolic geometry.

1409.6247 2026-05-11 cs.FL

Distributional Learning of Context-Free Languages under Fixed Finite-Monoid Typing

Takayuki Kuriyama

AI总结 本文研究在固定有限单体同态 $h$ 所定义的可识别同余关系 $\sim_h$ 下,上下文无关语言的分布学习问题。通过引入带类型细化的重构理论,作者提出了一个有限类型重构基的概念,并证明该基可通过有限观测集暴露出来。在此基础上,他们展示了从正例数据中可以精确重构目标语言,并构造了一个多项式时间可生成的假设文法,从而证明了该语言类在正例数据下是可极限识别的。对于线性子类,还进一步给出了特征样本大小和词长的多项式上界。

详情
英文摘要

We study distributional learning of context-free languages under a fixed recognizable congruence $\sim_h$ given as the kernel of an explicit finite monoid homomorphism $h:Σ^*\to M$. For this fixed-$h$ setting, we develop a finite typed reconstruction theory for context-free $\sim_h$-substitutable languages. Starting from a reduced context-free grammar, we introduce a typed refinement that records both yield types and outer context types, show that the relevant structure is concentrated in a finite typed reconstruction basis, and prove that this basis is exposed by a finite observation set. Occurrences of the same nonterminal symbol may therefore have to be separated when their outer $h$-contexts differ. We then prove exact reconstruction from positive data. From any finite sample $K\subseteqΣ^*$, we construct a canonical hypothesis grammar $\hat G(K)$, and we show that once $K$ contains the finite observation set associated with the target typed grammar, $\hat G(K)$ generates the target language exactly. Consequently, for every explicit finite monoid homomorphism $h$, the class $\mathcal C_h^{\mathrm{cf}}$ of context-free $\sim_h$-substitutable languages is identifiable in the limit from positive data, with polynomial-time hypothesis construction and update. For the linear subclass $\mathcal C_h^{\mathrm{lin}}$, we further prove polynomial upper bounds on characteristic-sample size and word length. Thus the same learner gives a full polynomial time-and-data result for the linear subclass.