arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1755
2606.06961 2026-06-08 stat.ME 新提交

Causal inference of Plackett-Burman designs in applications

Plackett-Burman设计在应用中的因果推断

Shuchen Chang, Zhi-ming Li

AI总结 针对Plackett-Burman设计的四个应用,提出基于潜在结果的因果推断框架,定义有限总体下的因果效应,给出Neyman估计量及方差协方差估计,并进行Fisher精确检验和区间构造。

详情
AI中文摘要

受Plackett-Burman(PB)设计的四个应用驱动,本文提出了一个基于潜在结果的因果推断框架。首先,我们在有限总体下定义了PB设计的因果效应。然后,得到了因果效应的Neyman估计量,包括估计的方差和协方差。此外,我们进行了尖锐零假设检验,并使用算法构造了Fisher区间。最后,通过这些应用说明了所提出的方法。

英文摘要

Driven by four applications of Plackett-Burman (PB) designs, this paper proposes a causal inference framework based on potential outcomes. First, we define the causal effects of the PB designs under finite populations. The Neymanian estimator of causal effects is then obtained, including the estimated variance and covariance. Furthermore, we conduct a sharp null-hypothesis test and construct the Fisherian interval using an algorithm. Finally, the proposed methods are illustrated through these applications.

2606.06930 2026-06-08 stat.ME 新提交

Testing Equality of Conditional Distributions via Generative Models

通过生成模型检验条件分布相等

Hanjia Gao, Linjun Huang, Yun Yang, Xiaofeng Shao

AI总结 提出一种基于生成模型检验两个条件分布是否相等的方法,通过交叉生成对齐协变量,避免密度比估计和高维平滑,并开发了基于RKHS的检验统计量及自举校准算法,理论证明了双重稳健性。

Comments 93 pages, 4 figures

详情
AI中文摘要

我们研究了利用生成模型检验两个条件分布是否相等的问题。所提出的方法从每个样本中学习一个条件生成器,并利用它在另一个样本中观察到的协变量值生成响应,从而允许直接比较生成响应和观测响应。通过交叉生成对齐协变量,该方法避免了条件密度比估计和高维协变量的局部平滑。该构造的总体版本产生了一个条件差异,在适当的重叠条件下刻画了两个条件分布的相等性,而样本版本则定义了一个检验统计量,该统计量是RKHS索引经验过程的上确界,并采用乘子自举校准。基于交替最大化和核技巧,我们开发了一种计算高效的算法来评估该统计量及其自举模拟。理论上,我们推导了原假设和备择假设下检验统计量的极限分布,证明了自举的有效性和检验的一致性,并表明所提出的过程在条件生成器估计误差方面具有双重稳健性。模拟和实际数据应用表明,所提出的方法对多元响应和高维协变量表现良好。

英文摘要

We study the problem of testing whether two conditional distributions are equal using generative models. The proposed method learns a conditional generator from each sample and uses it to create responses at covariate values observed in the other sample, allowing generated and observed responses to be compared directly. By aligning covariates through cross-generation, the approach avoids conditional density-ratio estimation and local smoothing over high-dimensional covariates. The population version of this construction yields a conditional discrepancy that characterizes equality of the two conditional distributions under suitable overlap conditions, while the sample version leads to a test statistic defined as the supremum of an RKHS-indexed empirical process with multiplier bootstrap calibration. A computationally efficient algorithm for evaluating the statistic and its bootstrap analogue is developed based on alternating maximization and the kernel trick. Theoretically, we derive the limiting distribution of the test statistic under both the null and alternative hypotheses, prove bootstrap validity and consistency of the resulting test, and show that the proposed procedure attains a double-robustness property with respect to conditional generator estimation errors. Simulations and real data applications suggest that the proposed method performs well for multivariate responses and high-dimensional covariates.

2606.06753 2026-06-08 stat.ME 新提交

Cluster-Aware Conformal Calibration for Spatio-Temporal Distributional Prediction

面向时空分布预测的聚类感知保形校准

Gooyoung Kim, Chae Young Lim, Wen-Ting Wang, Hao-Yun Huang, Wei-Ying Wu

AI总结 针对DeepKriging在非均匀采样下空间基函数效率低的问题,提出聚类自适应空间基和聚类感知保形校准,提升时空分布预测的覆盖精度和尾部可靠性。

详情
AI中文摘要

DeepKriging类模型(如时空DeepKriging)通过基函数嵌入和随机梯度学习提高了可扩展性;然而,在高度非均匀采样模式下,固定的规则网格空间基仍然效率低下,往往将容量过度分配给稀疏区域,而对密集簇的分辨不足。为解决这一局限,我们提出了一种DeepKriging的实用扩展,用于可靠的时空分布预测,结合了聚类自适应空间基——其中心和尺度从空间采样密度初始化——以更好地捕捉异质空间采样,以及聚类感知保形校准,该校准在空间簇内确定预测区间宽度(当校准样本不足时使用全局回退)。由此产生的校准流程明确针对空间异质性和局部误校准,实验(包括模拟研究和PM$_{2.5}$数据分析)表明,与全局保形基线相比,在聚类观测模式下覆盖精度和尾部可靠性显著提高。

英文摘要

DeepKriging-style models, such as Spatio-Temporal DeepKriging, improve scalability through basis-function embeddings and stochastic gradient learning; however, fixed regular-grid spatial bases remain inefficient under highly non-uniform sampling patterns, often over-allocating capacity to sparse regions while under-resolving dense clusters. To address this limitation, we propose a practical extension of DeepKriging for reliable spatio-temporal distributional forecasting, incorporating cluster-adaptive spatial bases - whose centers and scales are initialized from {the spatial sampling density} - to better capture heterogeneous spatial sampling, together with cluster-aware conformal calibration that determines prediction-interval widths within spatial clusters (with a global fallback when calibration samples are insufficient). The resulting calibration pipeline explicitly targets spatial heterogeneity and local miscalibration, and experiments, including simulation studies and PM$_{2.5}$ data analysis, demonstrate substantially improved coverage accuracy and tail reliability under clustered observation patterns compared with a global conformal baseline.

2606.06730 2026-06-08 stat.ME 新提交

Bayesian genome-wide clustering and variable selection of transcriptomic data via rank-based mixtures

基于秩混合的转录组数据贝叶斯全基因组聚类与变量选择

Emilie Eliseussen, Haakon Muggerud, Luca Coraggio, Ida Scheel, Thomas Fleischer, Valeria Vitelli

AI总结 提出首个基于秩的模型 lowBM3,扩展贝叶斯 Mallows 模型以联合处理超高维数据中的聚类和变量选择,提供可扩展的贝叶斯框架,并在癌症基因组学应用中展示其有效性。

Comments 60 pages, 25 figures

详情
AI中文摘要

随着排名数据可用性的增加,对能够处理高维数据集并为所有估计提供不确定性量化的无监督基于秩的推理框架的需求日益增长。基于秩的方法在组学分析中也越来越受欢迎,因为对连续测量进行排序提供了一种处理非正态分布数据的稳健方法。贝叶斯 Mallows 模型(BMM)因其对各种排名数据的适应性以及灵活框架(将聚类级排名聚合与个体级推理相结合)而成为一种有前景的选择。然而,BMM 在超高维设置(如组学分析)中的可扩展性仍然有限。本文通过引入第一个基于秩的模型来解决这一问题,该模型将 BMM 推广到联合处理聚类和变量选择,即低维贝叶斯 Mallows 模型混合(lowBM3)。所提出的方法提供了一种新颖的贝叶斯框架,能够以可扩展的方式同时处理样本异质性、无监督参数估计和模型选择,适用于超高维数据。此外,还引入了一个配套的后处理框架,以提供共识排名和变量选择器的离散后验分布的后验总结。通过模拟研究评估了该方法的性能。该方法在癌症基因组学特征发现中的应用也展示了其实用性,其中对乳腺癌患者获得的 RNA-seq 批量基因表达数据进行了全基因组聚类。

英文摘要

With the increasing availability of ranking data, there has been a growing demand for appropriate unsupervised rank-based inferential frameworks capable of handling high-dimensional datasets and providing uncertainty quantification for all estimates. Rank-based methods have also seen a growing popularity in -omics pipelines, as ranking continuous measurements provides a robust means of handling non-normally distributed data. The Bayesian Mallows model (BMM) has emerged as a promising choice because of its adaptability to various types of ranking data and its flexible framework, integrating cluster-wise rank aggregation with inference at the individual level. However, the scalability of BMM to ultra-high-dimensional settings, such as -omics analyses, has remained limited. The present paper addresses this issue by introducing the first rank-based model generalizing BMM to jointly handle clustering and variable selection, namely the lower-dimensional Bayesian Mallows Model Mixture (lowBM3). The proposed method provides a novel Bayesian framework that simultaneously handles heterogeneity in the sample, unsupervised parameter estimation, and model selection in a scalable manner for ultra-high-dimensional data. Additionally, a companion postprocessing framework is introduced to provide posterior summaries of the discrete posterior distributions of both the consensus ranking and the variable selector. Simulation studies are performed to assess the performance of the method. The usefulness of the method is also shown in an application to signature discovery for cancer genomics, where RNA-seq bulk gene expression data obtained from breast cancer patients are clustered genome-wide.

2606.06699 2026-06-08 stat.ME stat.AP 新提交

Robust inference for cyclic-stress accelerated life tests under interval monitoring with lognormal lifetimes

对数正态寿命区间监测下循环应力加速寿命试验的稳健推断

María Jaenada, Leandro Pardo, Kiran Prajapat

AI总结 针对区间删失的对数正态寿命循环应力加速寿命试验,提出基于加权密度功率散度的稳健估计方法,推导渐近分布并给出置信区间,模拟和实例验证了抗异常值能力。

Comments 35 pages, 7 figures, 6 tables

详情
AI中文摘要

高可靠性产品通常需要在加速条件下进行测试,以便在可行的时间范围内诱发失效。对于使用寿命涉及两个应力水平之间重复交替的产品,例如汽车空调、电池和航空航天部件,循环应力加速寿命试验(CyALT)提供了比传统加速试验更真实的负载曲线。在实践中,失效通常仅在计划的检查时间记录,导致区间删失计数而非精确寿命。此外,传统的最大似然估计对数据污染敏感,这在工业小样本实验中是一个实际问题。本文针对区间监测下具有对数正态寿命的CyALT模型,开发了稳健的推断程序。通过最小化加权密度功率散度(WDPD)获得稳健估计量,即加权最小密度功率散度估计量(WMDPDE)。我们建立了WMDPDE的渐近分布,推导了影响函数表达式以表征稳健性,并给出了重要寿命特征的渐近和自助法置信区间。模拟研究证实,WMDPDE在干净数据下保持高效率的同时,对异常值提供了实质性保护。通过分析空调可靠性数据集展示了该方法,证明了CyALT框架中稳健推断的实际优势。

英文摘要

Highly reliable products are often tested under accelerated conditions to provoke failures within a feasible timeframe. For products whose service life involves repeated alternation between two stress levels, such as automotive air-conditioners, batteries, and aerospace components, cyclic-stress accelerated life testing (CyALT) provides a more realistic loading profile than conventional accelerated tests. In practice, failures are often recorded only at scheduled inspection times, leading to interval-censored counts rather than exact lifetimes. Moreover, traditional maximum likelihood estimation is sensitive to data contamination, which is a genuine concern in small-sample industrial experiments. This paper develops robust inferential procedures for CyALT models with lognormal lifetimes under interval monitoring. Robust estimators are obtained by minimizing a weighted density power divergence (WDPD), leading to the weighted minimum density power divergence estimator (WMDPDE). We establish the asymptotic distribution of the WMDPDE, derive influence function expressions to characterize the robustness, and present asymptotic and bootstrap confidence intervals for important lifetime characteristics. A simulation study confirms that the WMDPDE provides substantial protection against outliers while retaining high efficiency under clean data. The methodology is illustrated through the analysis of an air-conditioner reliability dataset, demonstrating the practical advantages of robust inference in the CyALT framework.

2606.06670 2026-06-08 stat.AP 新提交

When Should Forecasting Models Be Re-Specified? A Cost-Sensitive Trigger for Adaptive Model-Form Updating

预测模型何时应重新指定?一种成本敏感的触发机制用于自适应模型形式更新

Harrison Katz

AI总结 针对预测系统模型形式更新频率问题,提出基于规范债务的成本敏感触发规则,在保持预测精度的同时降低计算成本和不稳定性,并在M4数据上验证其有效性。

详情
AI中文摘要

预测系统通常在每个评审周期进行刷新,该刷新通常包含两个不同的操作:估计参数和选择模型形式。最近的证据表明,第二个操作通常是不必要的,因为中间更新策略可以在大致保持预测精度的同时降低计算成本和预测不稳定性。本技术说明探讨了补充性问题:一旦系统采用了减少更新的策略,何时应中断该策略并重新指定模型形式?我们将规范债务定义为针对部署模型形式积累的证据,并利用它构建一个成本敏感的重新指定触发机制。在封闭的离散模型空间中,该触发机制简化为对部署规范的后验概率负对数的阈值。在开放的生产环境中,相同的决策规则可以通过预测得分差距、堆叠权重或校准的监测诊断来运行。固定更新频率是该规则的一个特例,当针对部署形式的证据以恒定速率积累时恢复。我们在500个M4月度序列上说明了这一想法,比较了完全更新、固定模型形式更新频率、仅参数更新以及有上限的自适应得分触发更新,并在有限ETS网格内,根据候选形式的AIC和BIC权重计算了规范债务的信息准则类似物。在该示例中,最佳的有上限自适应策略在精度上与完全更新相当,运行时间约为完全更新的28%,降低了预测不稳定性,并且行为类似于具有少量基于证据的例外的固定调度。

英文摘要

Forecasting systems are commonly refreshed at every review period, and that refresh usually bundles two distinct operations: estimating parameters and selecting the model form. Recent evidence suggests the second operation is often unnecessary, since intermediate updating strategies can hold forecast accuracy roughly fixed while cutting computational cost and forecast instability. This technical note takes up the complementary question. Once a system has adopted a reduced-update policy, when should it interrupt that policy and re-specify the model form? We define specification debt as the evidence accumulated against the deployed model form, and we use it to build a cost-sensitive trigger for re-specification. In a closed discrete model space the trigger reduces to a threshold on the negative log posterior probability of the deployed specification. In open production settings the same decision rule can be run with predictive score gaps, stacking weights, or calibrated monitoring diagnostics. Fixed update frequencies turn out to be a special case of the rule, recovered when evidence against the deployed form accumulates at a constant rate. We illustrate the idea on 500 monthly M4 series, comparing full updating, fixed model-form update frequencies, parameter-only updating, and capped adaptive score-triggered updating, and within the finite ETS grid we also compute information-criterion analogues of specification debt from AIC and BIC weights over the candidate forms. In that illustration the best capped adaptive policy is comparable to full updating in accuracy, runs in about 28 percent of full-update computational time, lowers forecast instability, and behaves like a fixed schedule with a small number of evidence-based exceptions.

2606.06571 2026-06-08 stat.AP 新提交

Counting the uncounted: How many were killed in Guatemala, 1978-1995?

计数未计数者:1978-1995年危地马拉有多少人被杀害?

Nils Lid Hjort

AI总结 针对多项分布中零单元格计数缺失的问题,提出参数化推断方法估计未知数量,并应用于估算危地马拉种族灭绝期间(1978-1995年)的死亡人数。

Comments 10 pages, 3 figures. Invited chapter, for invited talk, at the 40th International Workshop on Statistical Modelling, Oslo, June 28 to July 3, 2026; will be published in Conference Proceedings, in different layout etc

详情
AI中文摘要

在各种应用领域中,存在一个特定的“零单元格”,在多项分布设置中,其他单元格有观测记录,但无法计数零单元格的发生次数。我开发了推断理论,通过参数化建模,在可获得其他单元格计数的情况下,评估此类未知数量,即计数未计数者。这些方法用于估算危地马拉种族灭绝时期(1978-1995年)的死亡人数。有三份精心整理的遇害者名单,其信息可映射到一个包含$2^3=8$个单元格的维恩图。对七个观测单元格求和,可识别出$R=47,803$名遇害者,但$N_{0,0,0}$有多大,进而$N=N_{0,0,0}+R$是多少?

英文摘要

In various application domains, there is a certain `null cell', inside a multinomial setup, where observations are recorded for the other cells, but where one cannot count the number of occurrences for the null cell. I develop inference theory for assessing such unknown numbers, counting the uncounted, in situations where counts are available for the other cells, via parametric modelling. The methods are used to estimate the number of persons killed in Guatemala during the Genocidio guatemalteco years 1978--1995. There are three carefully curated lists of killed people, where the information can be mapped to a Venn diagram with $2^3=8$ cells. Summing over the seven observed cells, $R=\hbox{47,803}$ killed individuals can be identified, but how big is $N_{0,0,0}$, and hence $N=N_{0,0,0}+R$?

2606.07445 2026-06-08 q-fin.MF cs.GT econ.TH q-fin.PR 新提交

Bubbles vs. Baselines: Token Valuation and Institutional Capital in PoS Networks under EIP-1559

泡沫 vs. 基线:EIP-1559下PoS网络中的代币估值与机构资本

Mikhail Perepelitsa

AI总结 本文构建了一个开放经济宏观均衡模型,分析EIP-1559下PoS网络中机构投资者与零售消费者的策略互动,揭示代币估值锚定于网络采用率的基本面,而机构超额收益源于零售消费者交易效用的杠杆提取。

详情
AI中文摘要

本文提出了一个开放经济宏观均衡模型,用于描述具有费用销毁机制(EIP-1559)的权益证明(PoS)网络,该模型形式化了凯利优化理性机构投资者与效用驱动零售消费者之间的策略互动。我们分析了两种行为模式下的网络动态。在无界积累模型中,消费者纯粹积累代币,产生独家买方压力,与机构投资组合再平衡相互作用,助长不断扩大的投机泡沫,并为投资者带来复合超额收益。相反,在效用消费模型中,消费者动态买卖代币,以平衡加密财富与现实世界的法币消费。在此框架内,我们推导出ETH的显式稳态均衡价格,展示了代币估值如何锚定于稳定的基本基线,该基线直接随网络采用率变化,同时完全消除机构收益溢价。我们的数值模拟表明,虽然外生传统金融(TradFi)冲击通过投资组合再平衡传播,导致代币价格高波动,但网络通胀保持高度稳定。此外,我们证明网络安全性通过反周期消费者行为免受机构垄断的影响。我们的发现表明,PoS生态系统中机构超额财富的创造并非源于质押协议本身,而是严格由零售消费者对交易效用的持续需求的杠杆提取驱动。

英文摘要

This paper presents an open-economy macroeconomic equilibrium model for Proof-of-Stake (PoS) networks with fee-burn mechanics (EIP-1559) that formalizes the strategic interplay between a Kelly-optimizing rational institutional investor and a utility-driven retail consumer. We analyze network dynamics across two behavioral regimes. In The Unbounded Accumulation Model, the consumer purely accumulates tokens, creating an exclusive buy-side pressure that interacts with institutional portfolio rebalancing to fuel an ever-expanding speculative bubble and generate compounding excess returns for investors. Conversely, in The Utility-Consumption Model, the consumer dynamically buys and sells tokens to balance crypto wealth against real-world fiat consumption. Within this framework, we derive an explicit steady-state equilibrium price for ETH, demonstrating how token valuation anchors to a stable fundamental baseline that scales directly with network adoption while completely dissolving the institutional yield premium. Our numerical simulations show that while exogenous traditional finance (TradFi) shocks propagate through portfolio rebalancing to drive high token price volatility, network inflation remains highly stable. Furthermore, we prove that network security is insulated from institutional monopoly by counter-cyclical consumer behavior. Our findings reveal that institutional excess wealth creation in PoS ecosystems is not native to the staking protocol itself, but is strictly driven by the leveraged extraction of the retail consumer's continuous demand for transactional utility.

2606.07109 2026-06-08 econ.GN q-fin.EC 新提交

Museums as Policy Tools: The Behavioral Impact of Cultural Experiences

博物馆作为政策工具:文化体验的行为影响

Paolo Pin, Roberto Rozzi, Alessandro Stringhi

AI总结 通过田野实验发现,参观强调历史关怀功能的博物馆后,游客对难民非政府组织的捐赠增加,表明主题性博物馆体验可提升慈善行为。

详情
AI中文摘要

当博物馆的内容经过精心策划时,它们可以充当政策工具。我们在锡耶纳的圣玛丽亚德拉斯卡拉博物馆设计了一个框架田野实验,利用该遗址历史上提供护理和庇护的角色。随机分配到强调这一功能的导览的游客,后来比那些遵循标准艺术路线的游客向支持难民的非政府组织捐赠更多,且效果集中在女性参与者中。这些结果表明,主题针对性的博物馆体验可以显著提升对弱势群体的慈善行为,凸显了文化机构在行为公共政策中未被充分利用的潜力。

英文摘要

Museums can serve as policy tools when their content is purposefully curated. We designed a framed field experiment at the Santa Maria della Scala museum in Siena that leveraged the site's historical role offering care and hospitality.Student visitors randomly assigned to a tour emphasizing this function later donated more to an NGO supporting refugee than those who followed a standard artistic itinerary, with effects concentrated among female participants. These results show that thematically targeted museum experiences can measurably boost charitable behavior toward vulnerable groups, underscoring the untapped potential of cultural institutions in behavioral public policy.

2606.07059 2026-06-08 q-fin.TR 新提交

Diffusive in plain sight: An inconspicuous law of market impact

扩散中的隐形:一个不显眼的市场冲击定律

Julius F. Bonart

AI总结 通过将冲击分解为实际收益与反事实收益之差,并要求两者均为扩散过程,推导出限制个体参与者冲击规模的恒等式,该恒等式在信息中性条件下导出平方根定律,在强信息耦合下过渡到线性冲击,与实证一致。

详情
AI中文摘要

将冲击分解为实际收益与反事实收益之差,并要求两者均为扩散过程,得到一个恒等式,该恒等式限制了个体参与者层面可接受的冲击规模。这一约束在信息中性条件下隐含平方根定律,并在强信息耦合下过渡到线性冲击,与实证观察一致。在弱耦合条件下,累积市场冲击本身是扩散的——这是许多传播子和潜在流动性模型未能满足的诊断标准。

英文摘要

Decomposing impact as the difference between realized and counterfactual returns and requiring both to be diffusive yields an identity that restricts admissible impact scaling at the level of individual participants. This constraint implies the square-root law in the information-neutral regime and a crossover to linear impact under strong informational coupling, consistent with empirical observations. In the weak-coupling regime, cumulative market impact is itself diffusive -- a diagnostic that many propagator and latent liquidity models fail to satisfy.

2606.06737 2026-06-08 q-fin.MF 新提交

Fast-excursion limit of the Heston model

Heston模型的快-游走极限

Ryan McCrickerd

AI总结 本文提出Heston模型在Mechkov快回复极限下的新快游走模型,该模型通过价格区间瞬时游走影响障碍期权敲出概率,并引入区间值过程与随机闭集选择理论,模拟显示障碍期权敲出概率显著增加。

Comments 28 pages, 7 figures

详情
AI中文摘要

本文介绍了一种来自金融价格过程的非常规模型,该模型源于经典Heston模型在Mechkov快回复极限下的演化。这种新的快游走Heston模型在每个时刻通过一个价格区间表现出瞬时(即快速)游走,这些游走对普通期权不可见,但对敲出概率和连续监测的奇异期权至关重要。理论上,该模型提供了一个罕见的随机波动率模型非退化极限的例子,该极限避开了Skorokhod拓扑。这引导我们得到一类区间值过程,它们作为次生Levy过程的提升存在,通过随机闭集理论中的选择概念。在实际方面,我们展示了如何使用价格-时间参数表示模拟该模型,并利用专门构建的经典Heston模拟方案来可视化收敛。最后,我们演示了该模型如何显著提高障碍期权的敲出概率(对于一个月期EURUSD期权约为10%),因为考虑了游走风险。

英文摘要

This article introduces an unconventional model for price processes in finance that emerges from the classical Heston model under Mechkov's fast-reversion limit. This new fast-excursion Heston model exhibits instantaneous (i.e. fast) excursions through an interval of prices at each time, which are invisible to vanilla options but critical for hitting probabilities and continuously monitored exotics. Theoretically, the model provides a rare example of a non-degenerate limit of stochastic volatility models that escapes the Skorokhod topologies. This leads us to a class of interval-valued processes which exist as lifts of subordinated Levy processes, through the concept of selections in the theory of random closed sets. On the practical side, we show how the model can be simulated using price-time parametric representations, and utilise a purpose-built classical Heston simulation scheme in order to visualise convergence. Finally we demonstrate how this model raises hitting probabilities for barrier options considerably (of order 10% for one-month EURUSD options), due to taking excursion risk into account.

2606.06652 2026-06-08 econ.GN cs.CE cs.IT eess.SP math.IT q-fin.EC 新提交

Probabilistic Risk Sensitivity and Loss Aversion in Cumulative Prospect Theory

累积前景理论中的概率风险敏感性和损失厌恶

Symeon Vaidanis, Marios Kountouris

AI总结 提出二元赌博框架,定义概率风险敏感性指标为概率阈值比,用于分析累积前景理论中的接受和偏好阈值,并与效用溢价、概率溢价及Arrow-Pratt曲率度量进行比较。

Comments This paper has been submitted for publication

详情
AI中文摘要

本文开发了一个二元赌博框架,用于表征累积前景理论(CPT)中的风险敏感性和损失厌恶。所提出的概率风险敏感性度量被定义为一个概率阈值比,该比率决定了涉及确定结果与二元赌博或两个二元赌博的选择问题中的接受阈值和偏好阈值。我们展示了如何在该框架中恢复对称和非对称赌博厌恶的标准概念,并将所得的基于阈值的条件与效用溢价、概率溢价和Arrow-Pratt曲率度量进行比较。分析阐明了这些准则何时一致、何时分歧,特别是在递增厌恶条件、概率分布不等的二元赌博以及涉及概率权重函数的情形中。我们还识别了当使用CPT效用函数表示参考点处的损失厌恶时出现的技术限制。所得框架提供了直接与概率阈值相关的风险敏感性的决策理论解释,并补充了现有的基于溢价的方法。

英文摘要

This paper develops a binary-gamble framework for characterizing risk sensitivity and loss aversion in Cumulative Prospect Theory (CPT). The proposed probabilistic risk-sensitivity metric is defined as a probability-threshold ratio that determines acceptance and preference thresholds in choice problems involving either a certain outcome and a binary gamble or two binary gambles. We show how standard notions of symmetric and non-symmetric bet aversion can be recovered within this framework, and we compare the resulting threshold-based conditions with utility premia, probability premia, and Arrow--Pratt curvature measures. The analysis clarifies when these criteria coincide and when they diverge, particularly for increasing aversion conditions, binary gambles with unequal probability distributions, and settings involving probability weighting functions. We also identify technical restrictions that arise when CPT-utility functions are used to represent loss aversion at the reference point. The resulting framework provides a decision-theoretic interpretation of risk sensitivity that is directly tied to probability thresholds and complements existing premium-based approaches.

2606.07372 2026-06-08 q-bio.PE math.DS 新提交

Nullclines, Subnullclines and the Asymptotic and Transient Attractors in Eco-Evolutionary Dynamics

生态进化动力学中的零线、子零线以及渐近和瞬态吸引子

Krzysztof Argasinski, Manjyot Singh Bedi, Mark Broom

AI总结 本文通过分析经典鹰鸽博弈的生态进化动力学,发现频率和密度零线交点决定的稳定与不稳定平衡点由异宿轨道连接,并引入子零线概念,进而考虑环境季节性导致复杂循环行为,子零线作为扰动传播的屏障。

详情
AI中文摘要

在人口统计学框架中,死亡率支付函数描述交互的成本,而生育率支付函数描述其回报。因此,虽然死亡率成本取决于对手的策略,但生育率奖励可能受到密度依赖的幼体补充存活率的影响。这激发了对经典鹰鸽博弈的生态进化动力学的分析。结果表明,由频率和密度零线的交点决定的稳定和不稳定平衡点通过异宿轨道连接,这些轨道吸引附近的轨迹。由此产生的轨迹束导致发现了所谓的子零线(位于频率和密度零线之间的流形),然后它们收敛到稳定不动点。然后通过添加环境季节性(周期性背景死亡率)作为外部因素来扩展初始孤立系统。这导致复杂的循环行为,子零线作为扰动传播的屏障(弹性/抵抗阈值)。因此,从某种意义上说,本文完成并扩展了先前关于具有人口统计支付的博弈的生态进化动力学的工作。

英文摘要

In the demographic framework, mortality payoff function describes the cost of an interaction and fertility payoff function describes its reward. So while mortality cost depends on opponent's strategy, fertility reward can be affected by the density-dependent juvenile recruitment survival. This motivates an analysis of the eco-evolutionary dynamics of the classical Hawk-Dove game. It is shown that the stable and unstable equilibria (determined by the intersections of frequency and density nullclines) are connected by heteroclinic orbits, which attract nearby trajectories. The resulting bundle of trajectories leads to the discovery of the so-called subnullcines (manifolds placed between frequency and density nullcline) before they converge to the stable rest point. The initial isolated system is then extended by adding environmental seasonality (periodic background mortality), which acts as an external factor. This leads to complex cycling behavior and the subnullclines act as barriers to the propagation of the perturbation (resilience/resistance threshold). Thus, in a way, this paper completes, yet extends, previous works on the eco-evolutionary dynamics of games with demographic payoffs.

2606.07336 2026-06-08 q-bio.NC 新提交

Fixed point compositionality via low-rank gluing rules in inhibition-dominated threshold-linear networks

抑制主导阈值线性网络中基于低秩粘合规则的定点组合性

Juliana Londono Alvarez

AI总结 本文研究抑制主导阈值线性网络中结构模块性如何支持功能组合性,通过引入低秩粘合规则,证明全局定点是局部定点的组合,并应用于图网络以扩展定点分解规则。

Comments 39 pages, 18 figures

详情
AI中文摘要

大脑在相对稳定的结构和有限资源上常规地产生高度灵活和复杂的行为。这种能力的一个关键机制是组合性,它允许大脑有效地将复杂任务分解为更简单、可重用的基元。虽然网络模块性在生物和人工网络中常与组合性相关联,但在非线性网络中这种关系的严格数学表征仍然缺乏。在这项工作中,我们正式研究了结构模块性如何支持抑制主导阈值线性网络(TLNs)中的功能组合性。我们引入了一类新颖的模块化网络组装,称为低秩粘合,其中具有任意内部连接的组件子网络通过特定的低秩耦合连接。我们证明了这些网络的全局定点被限制为其组成模块的局部定点的组合。对于更结构化的子类,称为秩-1粘合,我们提供了完整的表征,确定哪些局部定点的组合产生全局定点。我们将这些结果应用于基于图的网络,将定点分解规则从组合阈值线性网络(CTLNs)扩展到更灵活的广义CTLNs(gCTLNs)家族,从而证明这些结构规则比最初假设的更鲁棒。最后,我们展示了这些粘合规则为工程化组合动力学提供了数学上易处理的配方,使得能够构建具有组合大量可预测吸引子库的网络,这些吸引子可以从更简单的组件基元理解,范围从定点组合到组合极限环。

英文摘要

Brains routinely generate highly flexible and complex behaviors on a relatively stable structure and limited resources. A key mechanism underlying this ability is compositionality, which allows the brain to efficiently decompose complex tasks into simpler, reusable primitives. While network modularity has often been linked to compositionality in biological and artificial networks, a rigorous mathematical characterization of this relationship in nonlinear networks is still lacking. In this work, we formally investigate how structural modularity supports functional compositionality in inhibition-dominated threshold-linear networks (TLNs). We introduce a novel class of modular network assembly called low-rank gluings, where component subnetworks with arbitrary internal connectivity are connected via specific low-rank couplings. We prove that the global fixed points of these networks are constrained to be combinations of the local fixed points of their constituent modules. For a more structured subclass, called rank-1 gluings, we provide a complete characterization that determines which combinations of local fixed points yield global ones. We apply these results to graph-based networks, extending fixed point decomposition rules from combinatorial threshold-linear networks (CTLNs) to the more flexible family of generalized CTLNs (gCTLNs), thereby proving that these structural rules are more robust than initially posited. Finally, we demonstrate that these gluing rules provide a mathematically tractable recipe for engineering compositional dynamics, enabling the construction of networks with a combinatorially large repertoire of predictable attractors that can be understood from simpler component motifs, ranging from compositions of fixed points to compositional limit cycles.

2606.07301 2026-06-08 q-bio.QM 新提交

Structure-guided taxonomic placement of divergent RNA viruses with ViraClass

基于结构的RNA病毒分类定位:ViraClass

Sheng Xu, Wenxuan Huang, Shutong Yue, Weiqiang Bai, Shiyang Feng, Xiaohan He, Bo Zhang, Qiantai Feng, Edward C. Holmes, Weifeng Shi, Siqi Sun

AI总结 针对RNA病毒分类中RdRp序列相似性低的问题,提出基于蛋白质结构的ViraClass框架,实现从门到属的层级分类,在深度进化距离上优于序列方法。

详情
AI中文摘要

宏转录组测序扩展了我们对RNA病毒圈的认识,其速度远超新病毒的分类学鉴定。科级以上的分类尤为困难,因为RNA依赖的RNA聚合酶(RdRp)通常是RNA病毒中唯一保留的基因,但在高度分化的病毒中序列相似性极低。这里我们证明,在RdRp一级序列相似性基本消失的进化深度上,RdRp蛋白质结构保留了分类信号,且这些信号的组织方式与当前ICTV层级一致。基于此,我们开发了ViraClass,一个用于RNA病毒分类定位的层级框架,它利用RdRp结构进行从门到属的逐级分类,在置信阈值支持的最深等级停止,并对仍处于现有参考空间之外的病毒进行校准的结构聚类。在随机分割、前瞻性和分类学保留基准测试中,ViraClass优于基于序列和基因组内容的基线方法。最大的提升出现在深度进化距离上,在从参考中保留整个科、目或纲的基准测试中,基于序列的方法失去了大部分信号。在诸如黄病毒科等具有挑战性的边界案例中,ViraClass基于结构的分类定位捕捉到了近期系统发育研究强调的分类边界张力。当应用于大量先前未分类的RdRp序列时,ViraClass将高置信度查询归入现有门,并将剩余序列组织成紧凑的结构组。因此,ViraClass提供了一种可扩展的方法,从大规模病毒发现到层级分类解释,特别是在当前基于序列的流程无法达到的深度进化范围。

英文摘要

Metatranscriptomic sequencing has expanded our knowledge of the RNA virosphere far more rapidly than novel viruses can be taxonomically classified. Taxonomic assignment above the family level is particularly difficult because the RNA-dependent RNA polymerase (RdRp) is often the only gene retained across RNA viruses yet exhibits little sequence similarity among highly divergent viruses. Here we show that RdRp protein structure retains taxonomic signal at evolutionary depths where RdRp primary sequence similarity has largely collapsed, and that the organization of this signal is consistent with the current ICTV hierarchy. Based on this, we developed ViraClass, a hierarchical framework for RNA virus taxonomic placement that uses RdRp structure for rank-by-rank assignment from phylum to genus, stopping at the deepest rank supported by confidence thresholds, and calibrated structural clustering for viruses that remain outside existing reference space. Across random-split, prospective and taxonomic hold-out benchmarks, ViraClass outperforms sequence-based and genome-content baselines. The largest gains emerge at deep evolutionary distances, in benchmarks that withhold entire families, orders or classes from the reference, where sequence-based methods lose most of their signal. In challenging boundary cases such as the Flaviviridae, ViraClass's structure-based placements capture the taxonomic boundary tensions highlighted by recent phylogenetic studies. When applied to a large collection of previously unclassified RdRp sequences, ViraClass places high-confidence queries into existing phyla and organizes the remainder into compact structural groups. ViraClass therefore provides a scalable approach from large-scale virus discovery to hierarchical taxonomic interpretation, particularly at the deep evolutionary ranges that current sequence-based pipelines cannot reach.

2606.06889 2026-06-08 q-bio.GN 新提交

From Genomes to Algorithms: Neural Network Applications for Palimpsest Detection in Medieval Manuscripts

从基因组到算法:中世纪手稿中重写本检测的神经网络应用

James B. Harr, Madelin E. Blong, Tessa Gadomski, Kelly A. Meiklejohn, William E. Gundling

AI总结 本研究通过非破坏性采样和测序,结合机器学习分类器(逻辑回归和神经网络),评估重写本制备对DNA完整性的影响,并探索计算方法在识别重写本中的应用。

详情
AI中文摘要

生物密码学(Biocodicology)研究手稿中保存的生物信息,为将羊皮纸视为文本和生物制品提供了新机会。本研究采用非破坏性采样,从14世纪手稿Ms. Codex 1629(包含单次使用和重写本页)中分离并测序线粒体基因组(mtGenomes)。我们旨在评估重写本制备(包括化学清洗)是否损害DNA完整性,以及计算方法是否有助于识别重复使用的羊皮纸。DNA测序显示,单次使用和重写本羊皮纸均保留了足够的mtGenomes用于分析,基因组覆盖度和深度无显著差异。为了评估计算生物学在手稿研究中的潜力,我们实施了机器学习分类器,包括逻辑回归和神经网络,以区分重写本和单次使用页。模型实现了高精度,但对少数类重写本的召回率较低,反映了数据集不平衡。虽然需要更多来自重写本的古代mtGenome样本并进行进一步测试,但本研究证明了整合分子生物学和神经网络如何为重写本检测提供新方法,并强调了数据科学在生物密码学中不断演变的作用。

英文摘要

Biocodicology, the study of biological information preserved in manuscripts, offers new opportunities to examine parchment as both a textual and biological artefact. This study applies non-destructive sampling to isolate and sequence mitochondrial genomes (mtGenomes) from a 14th-century manuscript, Ms. Codex 1629, which contains both single-use and palimpsested folios. We sought to evaluate whether palimpsest preparation, including chemical washing, compromised DNA integrity and whether computational methods could aid in identifying reused parchment. DNA sequencing revealed that both single-use and palimpsested parchments retained sufficient mtGenomes for analysis, with no significant differences in genome coverage or depth. To assess the potential of computational biology in manuscript studies, we implemented machine learning classifiers, including logistic regression and neural networks, to distinguish palimpsests from single-use folios. Models achieved high precision but exhibited reduced recall for the minority palimpsest class, reflecting dataset imbalance. While additional ancient mtGenome samples from palimpsest are required and further testing is needed, this study demonstrates how integrating molecular biology and neural networks highlights new approaches for palimpsest detection and underscores the evolving role of data science in biocodicology.

2606.06749 2026-06-08 q-bio.QM 新提交

Deterministic access to global viral sequence data enables robust agentic scientific discovery

确定性访问全球病毒序列数据实现稳健的自主科学发现

Ferdous Nasri, Sarah Gurev, Patrick Varilly, Krithik Ramesh, Nuala A. O'Leary, Jonah Cool, Bernhard Y. Renard, Pardis C. Sabeti, Laura Luebbert

AI总结 针对基于大语言模型的科学代理在病毒数据检索中的高错误率问题,提出确定性查询框架gget virus,通过形式化NCBI Virus过滤流程、元数据约束和结构化记录检索,将检索准确率提升至90%以上,并减少98%数据传输。

详情
AI中文摘要

公共病毒基因组资源,如美国国家生物技术信息中心(NCBI)病毒数据库,是疫情应对、进化分析、疫苗设计和基因组监测的核心。然而,许多高价值检索工作流程仍针对交互式使用而非确定性、可重复的程序化接口进行优化。这给基于大语言模型(LLM)的科学代理带来了挑战,其中元数据解释、过滤逻辑或检索中的错误可能传播到不正确的数据集中。为了评估自主病毒数据检索,我们构建了VirBench,这是一个手动策划的基准测试,包含120个查询,涵盖多种病原体、分类级别和元数据过滤器。当包括Biomni、Claude、GPT和Edison Analysis在内的自主AI系统在没有专用检索层的情况下执行这些查询时,性能差异很大:平均准确率从Claude Sonnet 4的16.9%到GPT-5.5的91.3%,较新的前沿模型虽有进步,但残留错误仍会产生严重后果。为了解决这个问题,我们构建了gget virus,一个确定性查询框架,将NCBI Virus风格的过滤形式化为可重复的程序化系统。通过分阶段检索、在序列下载前应用元数据约束以及检索结构化的GenBank记录,gget virus在高容量查询中减少了超过98%的数据传输,同时保持了精确匹配语义。指示自主AI系统使用gget virus后,所有评估系统的准确率至少提高到90.0%,GPT-5.5最高达到99.7%,响应稳定性提高到0.92-1.00,错误幅度减小,并且通常减少了运行时间和工具调用。总之,这项工作确立了确定性数据访问作为可靠自主科学的关键基础设施,并为稳健的人类和AI驱动的病毒基因组学工作流程提供了可重复的检索层。

英文摘要

Public viral genome resources such as the National Center for Biotechnology Information (NCBI) Virus database are central to outbreak response, evolutionary analysis, vaccine design, and genomic surveillance. Yet many high-value retrieval workflows remain optimized for interactive use rather than deterministic, reproducible programmatic interfaces. This creates a challenge for Large Language Model (LLM)-based scientific agents, where errors in metadata interpretation, filtering logic, or retrieval can propagate into incorrect datasets. To evaluate agentic viral data retrieval, we built VirBench, a manually curated benchmark of 120 queries spanning diverse pathogens, taxonomic levels, and metadata filters. When autonomous AI systems, including Biomni, Claude, GPT, and Edison Analysis, were tasked with these queries without a dedicated retrieval layer, performance varied widely: mean accuracy ranged from 16.9% for Claude Sonnet 4 to 91.3% for GPT-5.5, with newer frontier models showing progress but residual errors remaining consequential. To address this, we built gget virus, a deterministic query framework that formalizes NCBI Virus-style filtering as a reproducible programmatic system. By staging retrieval, applying metadata constraints before sequence download, and retrieving structured GenBank records, gget virus reduces data transfer by more than 98% for high-volume queries while preserving exact-match semantics. Instructing autonomous AI systems to use gget virus increased accuracy to at least 90.0% across all evaluated systems and up to 99.7% for GPT-5.5, improved response stability to 0.92-1.00, reduced error magnitude, and generally decreased runtime and tool calls. Together, this work establishes deterministic data access as critical infrastructure for reliable agentic science and provides a reproducible retrieval layer for robust human- and AI-driven viral genomics workflows.

2606.06562 2026-06-08 q-bio.QM 新提交

Iterative AI-guided optimisation of selective triple-drug combinations for breast cancer

AI引导的选择性三联药物组合用于乳腺癌的迭代优化

Oghenejokpeme Orhobor, Abbi Abdel-Rehim, Emma Tate, Holly X. Smith, Elizabeth Bourne, Ross J. Collins, Larisa N. Soldatova, Ross D. King

AI总结 提出AI引导的QSAR驱动迭代优化框架,结合机器学习与自动化实验筛选,闭环发现选择性三联药物组合,在MCF7乳腺癌细胞中快速富集高效且选择性高的方案。

Comments 4 figures, 3 tables

详情
AI中文摘要

个性化癌症治疗旨在根据个体肿瘤特征定制治疗方案,然而肿瘤异质性和适应性耐药性持续限制临床疗效。药物组合通过同时靶向多条通路提供克服耐药性的策略,但其合理设计受限于巨大的组合搜索空间和实验成本。本文提出一个AI引导的、QSAR驱动的迭代优化框架,将机器学习与自动化实验筛选相结合,实现选择性多药疗法的闭环发现。从初始随机筛选开始,系统迭代预测、测试和优化针对MCF7乳腺癌细胞的三药组合。引入非致瘤性MCF10A细胞使得能够显式优化肿瘤选择性疗效,优先选择最大化杀伤癌细胞同时保护健康细胞的方案。经过连续迭代,该框架快速富集高选择性、高效能的组合,同时保持化学和机制多样性,避免收敛于狭窄解空间。通过持续从实验反馈中学习,该方法高效探索数百万种组合,识别出一小组经过验证的、肿瘤选择性方案。这些结果建立了AI驱动的闭环优化高阶药物组合的可扩展概念验证,展示了计算与实验的迭代整合如何实现精准肿瘤学中自适应且可能个性化的治疗设计。

英文摘要

Personalised cancer therapy aims to tailor treatment to individual tumour profiles, yet tumour heterogeneity and adaptive resistance continue to limit clinical efficacy. Drug combinations offer a strategy to overcome resistance by simultaneously targeting multiple pathways, but their rational design is constrained by the vast combinatorial search space and experimental cost. Here, we present an AI-guided, QSAR-driven iterative optimisation framework that integrates machine learning with automated experimental screening to enable closed-loop discovery of selective multi-drug therapies. Starting from an initial random screen, the system iteratively predicts, tests, and refines three-drug combinations targeting MCF7 breast cancer cells. Incorporation of non-tumorigenic MCF10A cells enables explicit optimisation of tumour-selective efficacy, prioritising regimens that maximise cancer cell killing while sparing healthy cells. Across successive iterations, the framework rapidly enriched for highly selective, high-efficacy combinations, while maintaining chemical and mechanistic diversity and avoiding convergence on a narrow solution space. By continuously learning from experimental feedback, the approach efficiently navigates millions of combinations to identify a small set of validated, tumour-selective regimens. These results establish a scalable proof-of-concept for AI-driven, closed-loop optimisation of higher-order drug combinations, demonstrating how iterative integration of computation and experimentation can enable adaptive and potentially personalised therapeutic design in precision oncology.

2606.07487 2026-06-08 cs.MA cs.GT cs.SI 新提交

Modelling Opinion Dynamics at Scale with Deep MARL

用深度MARL建模大规模意见动态

Lukas Seier, Brandon Kaplowitz, Sebastian Towers, Richard Bailey, Jakob Foerster

AI总结 提出GPU加速的共识与真相发现游戏,扩展其他玩法至一般和社交互动,在Bluesky网络子集上验证模型,发现高从众性降低集体准确性并促进不诚实行为。

Comments 35 pages, 28 figures, preprint

详情
AI中文摘要

意见动态建模通常依赖于手工设计的局部交互规则来研究涌现的宏观现象,如共识和极化。相比之下,多智能体强化学习(MARL)使智能体能够通过优化简单奖励直接学习此类行为。为了探索MARL在意见动态中的潜力,我们引入了一个GPU加速的共识与真相发现游戏,该游戏可扩展到多达1000个智能体的人群,与许多现实世界的社会子网络相当。为了防止不切实际的约定,我们将其他玩法扩展到一般和社交互动。接下来,我们通过学习的注意力层仅从图拓扑中恢复智能体重要性结构,在Bluesky网络的一个子集上验证了我们的模型,发现高度从众的人群与人类数据最匹配。在大型社交媒体网络中,这种高度的从众性显著降低了集体准确性,并促进了为了融入而撒谎的不诚实智能体。相比之下,小型、动态的狩猎采集网络受影响较小;在这里,从众甚至可以提高集体一致性。这表明进化的人类从众启发式与现代社交媒体环境之间的不匹配可能是错误信息的潜在促成因素。

英文摘要

Modelling opinion dynamics typically relies on hand-crafted local interaction rules to study emergent macroscopic phenomena such as consensus and polarisation. In contrast, multi-agent reinforcement learning (MARL) enables agents to learn such behaviours directly by optimising simple rewards. To explore the potential of MARL for opinion dynamics, we introduce a GPU-accelerated consensus and truth-finding game that scales to populations of up to 1000 agents, comparable to many real-world social sub-networks. To prevent unrealistic conventions, we extend other-play to general-sum social interactions. We next validate our model on a subset of the Bluesky network by recovering agent importance structures from graph topology alone via a learned attention layer, finding that highly conforming populations most closely match human data. In large social media networks such high levels of conformity significantly reduce collective accuracy and promote dishonest agents that lie to fit in. By contrast, small, dynamic hunter-gatherer networks are less affected; here, conformity can even improve collective agreement. This suggests a mismatch between evolved human conformity heuristics and modern social media environments as a potential contributor to misinformation.

2606.07486 2026-06-08 eess.SY cs.SY 新提交

OPENPATH: A Supervisor--Specialist Agent System for Personalized, Accessible, and Multi-stop Urban Trip Planning

OPENPATH: 一种用于个性化、无障碍和多站点城市出行规划的监督-专家智能体系统

Ziyang Xiong, He Zong, Zhiyuan Xue, Manxi Wu

AI总结 提出监督-专家多智能体系统OPENPATH,结合LLM解析自然语言与经典算法优化路径,支持个性化偏好、多站点规划和无障碍需求,并用于城市尺度无障碍分析。

详情
AI中文摘要

城市出行规划系统通常针对旅行时间和成本进行优化,但对真实旅行者带来的异质需求(如个性化偏好、多站点行程构建和端到端轮椅无障碍)支持有限。我们提出OPENPATH,一个监督-专家多智能体系统,在单一架构中处理所有这些任务。OPENPATH采用明确的劳动分工:LLM智能体解析自然语言输入、分类请求意图并协调执行,而经典算法在精选的移动性和无障碍数据上进行路线优化。这种设计确保生成的行程尊重异质用户偏好,并在请求时强制执行严格的无障碍要求。除了针对单个用户的规划,OPENPATH还作为城市规模无障碍分析的测量工具:应用于纽约市,系统揭示了显著的ADA基础设施差距,并量化了其对轮椅使用者就业可达性的影响。总体而言,本研究展示了监督-专家LLM智能体框架如何在真实城市环境中支持异质出行规划和透明、公平的交通分析。

英文摘要

Urban trip-planning systems are commonly optimized for travel time and cost, but they offer limited support for the heterogeneous needs that real travelers bring, such as personalized preferences, multi-stop itinerary construction, and end-to-end wheelchair accessibility. We present openpaths, a supervisor-specialist multi-agent system that handles all of these tasks within a single architecture. openpaths adopts a deliberate division of labor: LLM agents parse natural-language input, classify request intent, and orchestrate execution, while classical algorithms perform route optimization over curated mobility and accessibility data. This design ensures that the resulting trip honors heterogeneous user preferences and enforces strict accessibility requirements when requested. Beyond per-user planning, openpaths doubles as a measurement instrument for city-scale accessibility analysis: applied to NYC, the system reveals substantial ADA infrastructure gaps and quantifies their effect on job accessibility for wheelchair users. Overall, this study shows how a supervisor-specialist LLM agentic framework can support heterogeneous trip planning and transparent, equitable transportation analysis in real urban environments.

2606.07470 2026-06-08 cs.CR 新提交

Verifiable and Confidential DNN Inference on Low-End Edge Devices

低端边缘设备上的可验证且保密的DNN推理

Mohamed Khalil Kiri, Ivan De Oliveira Nunes, Aurélien Francillon, Norrathep Rattanavipanon

AI总结 提出VECODI框架,利用TrustZone-M TEE的SHANGRI-LA抽象,在非安全世界执行推理代码,以最小安全世界支持实现模型保密性和结果可验证性,适用于低端边缘设备。

Comments 12 pages, 4 figures, 5 tables, 1 algorithm

详情
AI中文摘要

在低端边缘设备上部署深度神经网络(DNN)推理带来了两个关键挑战:保护模型机密性以防止潜在受损的边缘系统,以及在不产生过高开销的情况下实现可验证推理。现有方法要么将部分模型和推理软件置于可信执行环境(TEE)内,导致高成本和应用程序相关的可信计算基(TCB),要么在不可信环境中执行,安全性较低。在这项工作中,我们提出了VECODI,一个用于在受限边缘设备上进行可验证且保密的DNN推理的框架。其核心是VECODI引入了SHANGRI-LA,一种在TrustZone-M TEE上的新执行抽象,它建立了一个权限严格介于安全世界和非安全世界之间的第三运行时环境。VECODI利用SHANGRI-LA在非安全世界中执行不可信的推理代码,同时使用最小的与应用无关的安全世界支持来保护模型机密性,并实现推理结果的可验证性(关于推理代码和模型参数的正确执行)。我们在真实的NUCLEO-L552ZE-Q开发板上实现了VECODI,并开源了其原型。我们的结果表明,VECODI具有较小的TCB、内存占用和运行时开销,使其成为低端边缘设备中安全推理的实用选择。

英文摘要

Deploying deep neural network (DNN) inference on low-end edge devices raises two key challenges: protecting model confidentiality against a potentially compromised edge system and enabling verifiable inference without incurring prohibitive overhead. Existing approaches either house partial models and inference software within trusted execution environments (TEEs), resulting in high cost and an application-dependent trusted computing base (TCB), or execute in untrusted environments, providing little security. In this work, we present VECODI, a framework for verifiable and confidential DNN inference on constrained edge devices. At its core, VECODI introduces SHANGRI-LA, a new execution abstraction on TrustZone-M TEEs that establishes a third runtime environment with privileges strictly between the Secure and Non-Secure Worlds. VECODI leverages SHANGRI-LA to execute untrusted inference code in the Non-Secure World while using minimal application-agnostic Secure-World support to protect model confidentiality and enable verifiability (with respect to proper execution of inference code and model parameters) of inference results. We realize VECODI on a real-world NUCLEO-L552ZE-Q development board and open-source its prototype. Our results show VECODI's small TCB, memory footprint, and runtime overhead, making it a practical option for secure inference in low-end edge devices.

2606.07455 2026-06-08 cs.AR 新提交

A 65 nm Trustworthy Hypoglycemia Forecasting Engine Achieving 11.3 nJ per Inference

一种65纳米可信赖的低血糖预测引擎,每次推理能耗11.3 nJ

Boyang Cheng, Jianbo Liu, Pengyu Ren, Xueji Zhao, Steven Davis, Likai Pei, Zephan M. Enciso, Kai Ni, Ningyuan Cao

AI总结 提出基于概率决策树的低血糖预测引擎,采用混合架构降低复杂度,在65 nm CMOS上实现11.3 nJ/推理和0.825的F1分数,对传感器噪声鲁棒性提升4.1-16.1倍。

详情
AI中文摘要

糖尿病影响数百万人,需要可靠的连续血糖监测以进行早期低血糖预警。然而,医疗AI系统不仅需要准确和节能,还需要可解释、对噪声鲁棒且具有不确定性意识。本文提出一种基于概率决策树的65 nm低血糖预测引擎,用于可信赖的医疗推理。所提出的混合架构将浅层树的精确算术评估与深层树的基于采样的推理相结合,将软决策树的复杂度从指数级降低到样本高效遍历。一个可重构的4×24×24概率节点阵列支持最大深度为12的任意树结构,并由片上低功耗RISC-V内核协调。采用65 nm CMOS制造,该芯片每次推理能耗为11.3 nJ,在连续血糖监测数据上实现了最先进的30分钟预测F1分数0.825。与传统决策树和随机森林模型相比,所提出的引擎对传感器噪声和数据点丢失的鲁棒性提高了4.1倍至16.1倍。这些结果表明,该引擎是一种用于可信赖低血糖预测的节能、可解释且具有不确定性意识的边缘AI引擎。

英文摘要

Diabetes affects millions of people and requires reliable continuous glucose monitoring for early hypoglycemia warning. However, medical AI systems must be not only accurate and energy efficient, but also explainable, noise robust, and uncertainty aware. This work presents a 65 nm hypoglycemia forecasting engine based on probabilistic decision trees for trustworthy medical inference. The proposed hybrid architecture combines exact arithmetic evaluation for shallow tree layers with sampling based inference for deeper layers, reducing soft decision tree complexity from exponential to sample efficient traversal. A reconfigurable 4 by 24 by 24 probabilistic node array supports arbitrary tree structures with a maximum depth of 12, coordinated by an on chip low power RISC V core. Fabricated in 65 nm CMOS, the chip achieves 11.3 nJ per inference and a state of the art 30 min forecasting F1 score of 0.825 on continuous glucose monitoring data. Compared with conventional decision tree and random forest models, the proposed engine improves robustness to sensor noise and data point drop off by 4.1x to 16.1x. These results demonstrate an energy efficient, explainable, and uncertainty aware edge AI engine for trustworthy hypoglycemia forecasting.

2606.07453 2026-06-08 cs.DS cs.DM 新提交

Odd Cycle Transversal in $P_k$-Free Graphs

在 $P_k$-自由图中的奇环横贯问题

Akramah Faizi, Arash Rafiey

AI总结 针对$P_k$-自由图,提出基于二分图环分解的常数因子近似算法,奇数$k$时近似比为$k-2$,偶数$k$时为$k-3$。

详情
AI中文摘要

奇环横贯(OCT)问题要求找到一个最小顶点子集,删除后使图变为二分图,是算法图论中的核心问题。已知即使在$P_k$-自由图上,对于$k \ge 6$,该问题也是NP完全的。此外,假设唯一游戏猜想(UGC),OCT在一般图上不存在常数因子近似算法。受这些困难结果的启发,我们研究了OCT在$P_k$-自由图上的可近似性。我们首先证明,该问题在$P_k$-自由图的特定子类上可以在多项式时间内解决,最值得注意的是$(P_6, C_3)$-自由图,通过利用二分图环的结构分解。以这些可处理的子结构为基础,我们提出了一个针对一般$P_k$-自由图上OCT的常数因子近似算法。当$k$为奇数时,我们达到$k-2$的近似比;当$k$为偶数时,近似比为$k-3$。这些结果提供了此类图依赖$k$的第一个非平凡常数因子近似,与UGC的推论一致,即不太可能存在与$k$无关的近似因子。

英文摘要

The Odd Cycle Transversal (OCT) problem, which asks for a minimum subset of vertices whose removal renders a graph bipartite, is a central problem in algorithmic graph theory. It is known to be NP-complete even on $P_k$-free graphs for $k \ge 6$. Furthermore, assuming the Unique Games Conjecture (UGC), OCT does not admit a constant-factor approximation algorithm on general graphs. Motivated by these hardness results, we investigate the approximability of OCT on $P_k$-free graphs. We first establish that the problem becomes polynomial-time solvable on specific subclasses of $P_k$-free graphs, most notably $(P_6, C_3)$-free graphs, by exploiting a structural decomposition into rings of bipartite graphs. Leveraging these tractable substructures as a basis, we present a constant-factor approximation algorithm for OCT on general $P_k$-free graphs. We achieve an approximation ratio of $k-2$ when $k$ is odd and $k-3$ when $k$ is even. These results provide the first nontrivial constant-factor approximations for this class dependent on $k$, aligning with the UGC implication that no approximation factor independent of $k$ is likely to exist.

2606.07450 2026-06-08 cs.SI q-fin.PM q-fin.ST 新提交

Information Networks of Stock Prices

股票价格的信息网络

Muhammad Aldy Hassan, Hokky Situngkir

AI总结 本文通过对比皮尔逊相关和互信息在印尼资本市场中的应用,发现皮尔逊相关、MST和Infomap组合在恢复行业分类上最稳健,而互信息与PMFG结合则能揭示隐藏的经济子结构。

Comments 12 pages, 6 figures

详情
AI中文摘要

股票价格的集体运动蕴含着复杂的相互依赖关系,传统上仅通过线性视角进行简化。本文通过测试皮尔逊相关和互信息在揭示市场谱动态方面的极限,探索了印尼资本市场的计算结构网络表示。在2015年至2025年的2328个滚动观察窗口中,我们检验了24种方法配置,这些配置结合了三种依赖估计器(皮尔逊、MI自适应分箱和MI-kNN)、两种图过滤方案(最小生成树/MST和平面最大过滤图/PMFG)以及四种社区解码器。实证结果揭示了一个基本事实:拓扑丰富度并不总是与行业分类精度共鸣。皮尔逊、MST和Infomap配置被证明是恢复传统行业分类最稳健的基础。然而,当更深入的观察需要揭示局部结构和异质社区的编织时,通过PMFG的结构松弛显示出其优越性。在残差信息检测领域,MI自适应分箱似乎比kNN更为成比例;基于直方图的正则化成功抑制了经验噪声,同时没有扫除非线性依赖的痕迹。最终,MI和PMFG的协同作用并非旨在取代线性相关的主导地位,而是为挖掘隐藏的经济子结构(例如商品体制的内聚性)提供一种必要的分析视角,这些结构早已超越市场正式部门的严格界限。

英文摘要

The collective movement of stock prices harbors complex interdependencies that are conventionally simplified only through a linear lens. This paper explores computed structural network representations in the Indonesian capital market by testing the limits of Pearson correlation and Mutual Information (MI) in unveiling the spectral dynamics of the market. Across 2,328 rolling observation windows from 2015 to 2025, we examine 24 methodological configurations that combine three dependency estimators (Pearson, MI adaptive binning, and MI-kNN), two graph filtering schemes (Minimum Spanning Tree/MST and Planar Maximally Filtered Graph/PMFG), and four community decoders. The empirical results unveil a fundamental reality: topological richness does not always resonate with sectoral classification precision. The Pearson, MST, and Infomap configuration is shown to remain the most robust foundation for recovering conventional sectoral taxonomy. Nevertheless, when deeper observation demands the exposition of local structures and the weave of heterogeneous communities, the architectural relaxation through PMFG demonstrates its superiority. In the realm of residual information detection, MI adaptive binning appears far more proportional than kNN; histogram-based regularization successfully tames empirical noise without sweeping away traces of non-linear dependency. Ultimately, the synergy of MI and PMFG is not positioned to dethrone the dominance of linear correlation, but rather to provide an essential analytical lens for excavating hidden economic sub-structures -- such as the cohesion of commodity regimes -- that have long transcended the rigid boundaries of the market's formal sectors.

2606.07448 2026-06-08 cs.SE 新提交

Agentic Very Much! Adoption of Coding Agent in New GitHub Projects

Agentic Very Much! 新GitHub项目中编码助手的采用

Romain Robbes, Théo Matricon, Thomas Degueule, Andre Hora, Stefano Zacchiroli

AI总结 研究新创建的GitHub项目中编码助手的采用情况,发现采用率是之前研究的两倍以上,且AI辅助提交比例显著更高。

详情
AI中文摘要

在之前的工作中,我们调查了GitHub项目中编码助手的采用情况,发现其非常显著。本研究延续这一工作线,但分析了之前研究之后创建的新项目。在这个新样本中,我们发现编码助手的采用率是之前的两倍以上。我们还发现采用强度显著增加,因为AI辅助提交的比例明显更高,尽管有强烈迹象表明我们并未检测到全部。

英文摘要

In previous work, we investigated the adoption of coding agents in GitHub projects, finding that it was very significant. This study follows this line of work, but analyses new projects, that were created after the previous study. In this new sample, we find that the adoption of coding agents is more than twice as high. We also find that the adoption is significantly more intensive, as the proportion of AI-assisted commits is sensibly higher, despite strong signs that we do not detect all of it.

2606.07439 2026-06-08 cs.AR 新提交

A 65 nm Multi-Modal Bayesian Inference Engine with 16.3 fJ/Sample Calibration-Free GRNG for Risk-Aware At-Home Skin Lesion Screening

65 nm 多模态贝叶斯推理引擎,具有 16.3 fJ/样本免校准 GRNG,用于风险感知的家庭皮肤病变筛查

Steven Davis, Likai Pei, Jianbo Liu, Zephan M. Enciso, Boyang Cheng, Xueji Zhao, Danny Z. Chen, Ningyuan Cao

AI总结 提出一种65 nm风险感知多模态贝叶斯推理引擎,通过存内计算架构实现词内混合高斯采样,提升不确定性建模能力,在鲁棒性和精度上超越现有单模态贝叶斯神经网络。

详情
AI中文摘要

我们提出了一种65 nm风险感知多模态贝叶斯推理引擎,用于在不受控制的家庭条件下进行隐私保护、完全设备上的皮肤病变筛查。所提出的存内计算架构执行词内混合高斯采样,改进了超越传统单模态贝叶斯神经网络的不确定性建模。这种增加的概率表达能力将等风险操作覆盖范围提高了1.4倍,对用户数据扰动的鲁棒性提高了>1.5倍,工艺变化弹性提高了5.5倍,并且与最先进的单模态贝叶斯神经网络相比,平衡精度提高了1.8%。硬件鲁棒性进一步通过使用互补工艺变化的免校准高斯随机数生成来支持,实现了16.3 fJ/样本和168.6 GSa/s/mm^2的效率。这些结果展示了一种实用、节能且风险感知的边缘AI解决方案,用于隐私敏感的医疗筛查。

英文摘要

We present a 65-nm risk-aware multimodal Bayesian inference engine for privacy-preserving, fully on-device skin lesion screening under uncontrolled at-home conditions. The proposed compute-in-memory architecture performs in-word Mixture-of-Gaussian sampling, improving uncertainty modeling beyond conventional unimodal Bayesian neural networks. This added probabilistic expressiveness increases equal-risk operating coverage by 1.4x, improves robustness to user-data perturbations by >1.5x, enhances process-variation resilience by 5.5x, and improves balanced accuracy by 1.8% over state-of-the-art unimodal Bayesian neural networks. Hardware robustness is further supported by calibration-free Gaussian random-number generation using complementary process variation, achieving 16.3 fJ/sample and 168.6 GSa/s/mm^2 efficiency. These results demonstrate a practical, energy-efficient, and risk-aware edge-AI solution for privacy-conscious medical screening.

2606.07434 2026-06-08 cs.GT 新提交

Evidence Markets

证据市场

Safwan Hossain, Gabriel Andrade, Chengqi Zang, Yiling Chen

AI总结 提出证据市场,通过动态调整流动性的对数市场评分规则,激励提交证据和信念,支持内生解决,证明有界损失、证据按市场不确定性奖励,并实现ε-DSIC策略。

详情
AI中文摘要

现代预测市场面临两个限制,限制了它们在多种场景中的适用性:~(i)~它们揭示了人群的信念,但没有揭示这些信念背后的证据或推理,以及~(ii)~它们需要一个具有外部真实结果的事件,该结果在已知的未来日期解决。我们通过引入证据市场来应对这两个挑战,证据市场是预测市场的一种推广,它激励在提交信念的同时提交证据,并且如果外部解决不可行,可以使用众包证据内生解决。其核心是使用对数市场评分规则,其流动性参数随累积证据质量动态变化。我们证明平台损失有界,证据根据当前市场不确定性获得奖励,并且可以通过自动做市商等价实现。在市场基于提交的证据内生解决的情况下,我们描述了隐瞒证据如何改变交易者对解决的信念,并利用它证明真实信念和证据报告始终是一个$\varepsilon$-占优策略激励兼容(DSIC)策略。为了解决操作上的考虑,我们提出了通过带有质押的LLM-as-a-Judge框架进行证据验证,并给出了一种不受验证瓶颈限制的异步执行算法。在整个工作中,我们使用LLM评估——确定哪个模型最适合给定任务——作为我们提出的市场的一个显著且具有代表性的运行示例。

英文摘要

Modern prediction markets face two limitations that restrict their applicability in a range of settings:~(i)~they reveal what the crowd believes but not the evidence or reasoning behind those beliefs, and~(ii)~they require an event with an external ground truth that resolves at a known future date. We address these twin challenges by introducing evidence markets, a generalization of prediction markets that incentivizes the submission of evidence alongside beliefs and can be endogenously resolved using the crowd-sourced evidence if external resolution is not possible. At its core, the market uses a logarithmic market scoring rule whose liquidity parameter changes dynamically with the accumulated evidence quality. We prove that platform loss is bounded, evidence is rewarded proportional to the current market uncertainty, and can be equivalently implemented through an automated market maker. In the case where the marker resolves endogenously based on submitted evidence, we characterize how withholding evidence shifts a trader's belief about resolution and use it to prove truthful belief and evidence reporting is a always an $\varepsilon$-dominant strategy incentive compatible (DSIC) strategy. To address operational considerations, we propose evidence verification via an LLM-as-a-Judge framework with staking and give an asynchronous execution algorithm that is not bottle-necked by verification. Throughout the work, we use LLM evaluations -- determining which model is best for a given task -- as a salient and representative running example for our proposed market.

2606.07427 2026-06-08 cs.CE 新提交

High-Frequency Preconditioners for Electromagnetic Integral Equations Based on Helmholtz Regularizations

基于Helmholtz正则化的电磁积分方程高频预处理器

S. Ciciriello, V. Giunzioni, A. Dély, A. Merlini, S. B. Adrian, F. P. Andriulli

AI总结 针对电场积分方程在不同频率和离散化条件下的病态问题,提出一种基于移位Helmholtz算子的新型预处理策略,稳定迭代次数并实现准线性复杂度。

详情
AI中文摘要

通过边界元法数值求解电场积分方程(EFIE)可能因不同情况下的条件数问题而面临计算挑战,例如:(i)频率降低而离散化密度保持不变时,(ii)频率保持不变而离散化细化时,以及(iii)频率随离散化密度增加而增加时。为了解决这些问题,文献中已经开发了几种针对相关矩阵系统的预处理方法,但只有少数方法能同时处理所有情况。本文研究了其中一种技术,并提出了一种加速相关矩阵-向量乘积(MVP)的策略。特别地,我们针对移位Helmholtz算子提出了一种新颖的预处理策略,而标准伪逆技术对此算子效果不佳。相反,我们的预处理技术的应用在所有上述情况下稳定了迭代次数。鉴于这些成果,当使用适当的加速策略时,移位Helmholtz算子的伪逆可以在准线性复杂度下获得,从而使得EFIE的数值解具有相同的复杂度。

英文摘要

The numerical solution of the Electric Field Integral Equation (EFIE) via the Boundary Element Method (BEM) can be computationally challenging due to conditioning issues arising in different regimes, such as (i) when the frequency decreases and the discretization density remains constant, (ii) when the frequency is kept constant while the discretization is refined, and (iii) when the frequency increases along with the discretization density. To address these issues, several preconditioning approaches for the related matrix system have been developed in the literature, only a few of which address all regimes simultaneously. This paper investigates one of these techniques and presents a strategy for accelerating the associated matrix-vector products (MVPs). In particular, we propose a novel preconditioning strategy for the shifted Helmholtz operator, for which standard pseudo-inversion techniques have shown unsatisfactory results. Instead, the application of our preconditioning technique stabilizes the number of iterations in all the aforementioned regimes. In view of these achievements, the pseudo-inversion of the shifted Helmholtz operator can be obtained in quasi-linear complexity when proper acceleration strategies are used, thus enabling the numerical solution of the EFIE with the same complexity.

2606.07420 2026-06-08 cs.CR 新提交

Lost in Migration: Exposing Android Framework Vulnerabilities in Parallel Java-Kotlin Implementations

迷失在迁移中:揭示Android框架中Java-Kotlin并行实现的安全漏洞

Rui Li, Wenrui Diao, Debin Gao

AI总结 本文首次系统研究Android框架中Java-Kotlin并行实现的语义差异,设计ParaDroid分析框架识别并比较并行方法,发现37个可利用漏洞,其中3个已确认并分配CVE。

Comments 14 pages

详情
AI中文摘要

Android已在应用和核心系统组件中采用Kotlin与Java并存。在此转变过程中,我们在Android开源项目(AOSP)中观察到并行实现,即同一组件同时用Java和Kotlin实现。原则上,它们的功能目的相同。实际上,可能出现微妙的语义差异。这些差异本身并非漏洞,但提供了可能揭示周围执行逻辑缺陷的有用线索。据我们所知,本文首次系统研究Android框架中Java-Kotlin并行实现,并考察其安全影响。我们设计并构建了ParaDroid,一个大规模识别并行方法并比较其行为的分析框架。ParaDroid将代码标准化为字节码级中间表示,重建类到源文件的映射,并使用大语言模型推理方法语义并识别行为差异。在AOSP Android 14-16上评估,ParaDroid识别了329个并行方法对和37个可利用的差异。我们负责任地向Android安全团队披露了可利用问题。已确认3个漏洞和2个缺陷,并分配了2个CVE编号。我们的结果表明,并行Java-Kotlin代码路径为发现现代Android中的安全缺陷提供了实用表面。

英文摘要

Android has adopted Kotlin alongside Java across apps and core system components. During this shift, we observe parallel implementations in the Android Open Source Project (AOSP) where the same component is implemented in both Java and Kotlin. In principle, their functional purposes are identical. In practice, subtle semantic divergences can appear. Such divergences are not vulnerabilities by themselves, but they provide useful clues that may reveal flaws in surrounding enforcement logic. To the best of our knowledge, this paper presents the first systematic study of Java-Kotlin parallel implementations in the Android framework and examines their security implications. We design and build ParaDroid, an analysis framework that identifies parallel methods at scale and compares their behaviors. ParaDroid normalizes code into a bytecode-level intermediate representation, reconstructs class-to-source mappings, and uses large language models to reason about method semantics and identify behavioral divergences. Evaluated on AOSP Android 14-16, ParaDroid identified 329 parallel method pairs and 37 vulnerable divergences. We responsibly disclosed the exploitable issues to the Android Security Team. Three vulnerabilities and two bugs have been confirmed, and two CVE IDs have been assigned. Our results demonstrate that parallel Java-Kotlin code paths provide a practical surface for discovering security flaws in modern Android.

2606.07408 2026-06-08 cs.DS cs.DB cs.FL cs.LO 新提交

Earliest query answering over streamed trees

流式树上的最早查询回答

Mateusz Gienieczko, Martín Muñoz, Filip Murlak, Charles Paperman

AI总结 针对大规模流式JSON/XML文档,提出基于单子二阶逻辑(MSO)的一元查询最早回答方法,实现常数时间更新和低延迟、低内存占用。

详情
AI中文摘要

流式处理允许对大规模JSON或XML文档执行查询,这些文档的大小使得完全解析成树变得不可行。最早查询回答是一种激进的方法,用于减少延迟和内存占用。为了最小化延迟,一旦保证某个文档节点是答案(无论文档如何结束),就必须立即返回该节点。类似地,为了最小化内存占用,一旦某个节点不可能成为答案(无论文档如何结束),就必须立即丢弃它。对于基于从根路径选择节点的简单查询,每个节点的决定可以当场做出,但诸如XPath或JSONpath等实用语言支持过滤器,允许基于从文档各个部分(可能更下游)收集的信息来选择节点。这使得最早查询回答成为一项具有挑战性的任务,因为候选节点必须保留在内存中,直到明确可以安全返回或丢弃它们。我们证明,对于所有可用单子二阶逻辑(MSO)表达的一元查询,这都可以实现,同时确保常数时间更新——前提是节点通过传递合适的迭代器返回,而不是逐个返回。

英文摘要

Streaming allows executing queries over massive JSON or XML documents whose size makes it infeasible to fully parse them into a tree. Earliest query answering is a radical approach to reducing latency and memory footprint. To minimize latency, a document node must be returned as soon as the node is guaranteed to be an answer regardless of how the document ends. Similarly, to minimize memory footprint, a node must be discarded as soon as it cannot become an answer regardless of how the document ends. For simple queries that select nodes based on the path from the root, the decision for each node can be made on the spot, but practical languages such as XPath or JSONpath support filters, which allow selecting nodes based on information collected from various parts of the document, possibly further down the stream. This makes earliest query answering a challenging task, as candidate nodes must be kept in memory until it becomes clear that they can be safely returned or discarded. We show that this can be done for all unary queries expressible in monadic second order logic (MSO), while ensuring constant update time -- provided that nodes are returned by passing a suitable iterator, rather than one by one.