arXivDaily arXiv每日学术速递 周一至周五更新

科学与医疗

AI for Science

科学智能、蛋白质、分子、药物、材料、气象、物理和数学 AI。

今日/当前日期收录 116 信号源:cs.LG, q-bio, physics, cond-mat, math, stat.ML
2606.20315 2026-06-19 q-bio.GN cs.CR 新提交 90%

bioETH-Beacon: A Confidential On-Chain Genomic Beacon with Encrypted Counts, Filters, and Bounded Noise over a Fully Homomorphic EVM

bioETH-Beacon: 基于全同态EVM的机密基因组信标,支持加密计数、过滤和有界噪声

Christos Galanopoulos, Kimon Antonios Provatas, Ilias Georgakopoulos-Soares

专题命中 其他科学智能 :基因组信标查询,隐私保护基因组学,属于科学智能

AI总结 提出基于全同态EVM的智能合约原型bioETH-Beacon,实现加密基因组信标查询,通过加密计数、有界噪声和访问控制抵御成员推理攻击,并优化查询成本。

Comments 11 pages, 6 figures, 8 tables. Research prototype for privacy-preserving genomics using Fully Homomorphic Encryption (FHE) on blockchain (fhEVM)

详情
AI中文摘要

全球基因组学与健康联盟(GA4GH)Beacon协议允许研究人员查询某个基因组变异是否在参与队列中被观察到,并返回聚合的变异级计数。随着Beacon网络的发展,两个隐私风险依然存在:宿主机构可以看到明文查询,而重复的罕见变异查询可能支持成员推理攻击。我们提出了bioETH-Beacon,一个智能合约原型,它在全同态以太坊虚拟机(fhEVM)上对加密数据执行Beacon“聚合计数”查询。医院上传加密的标记计数条目,授权研究人员提交加密的标记查询,合约返回加密答案,通过链下密钥管理服务仅释放给合约链上ACL中指定的请求者。该设计组织为一个3x4的层级-查询族网格,涵盖基因型、性别、年龄和表型查询,层级在更强的机密性和更低的查询成本之间进行权衡。对于基因型路径,原型可以添加链上有界噪声以减轻探测攻击。基于多基因评分(PGS)目录的合成面板实验显示了预期的扩展行为,并证明当公共标记存在是可接受的权衡时,预聚合可以显著降低查询gas成本。总体而言,bioETH-Beacon提供了一个无需可信计算评估者的机密Beacon式基因组查询研究原型。

英文摘要

The Global Alliance for Genomics and Health (GA4GH) Beacon protocol lets researchers ask whether a genomic variant has been observed in a participating cohort and receive aggregate variant-level counts. As Beacon networks grow, two privacy risks remain: host institutions can see plaintext queries, and repeated rare-variant queries can support membership-inference attacks. We present bioETH-Beacon, a smart-contract prototype that runs the Beacon "aggregate count" query over encrypted data on a fully homomorphic Ethereum Virtual Machine (fhEVM). Hospitals upload encrypted marker-count entries, authorized researchers submit encrypted marker queries, and the contract returns an encrypted answer that is released, via an off-chain key-management service, only to the requester named in the contract's on-chain ACL. The design is organized as a 3x4 tier-by-query-family grid spanning genotype, sex, age, and phenotype queries, with tiers that trade stronger confidentiality for lower query cost. For genotype paths, the prototype can add bounded on-chain noise to mitigate probing attacks. Experiments on synthetic panels derived from a Polygenic Score (PGS) catalog show the expected scaling behavior and demonstrate that pre-aggregation can substantially reduce query gas when public marker presence is an acceptable trade-off. Overall, bioETH-Beacon provides a research prototype for confidential Beacon-style genomic querying without a trusted compute evaluator.

2606.20000 2026-06-19 hep-ph physics.comp-ph 新提交 90%

Two Flavon Froggatt-Nielsen Models with Genetic Algorithms

双味标量Froggatt-Nielsen模型与遗传算法

Miguel Crispim Romão, Stephen F. King

专题命中 其他科学智能 :遗传算法扫描Froggatt-Nielsen模型

AI总结 利用遗传算法系统扫描双味标量Froggatt-Nielsen模型,发现其真空期望值相对相位提供CP破坏源,并找到超过10万个唯象可行模型。

Comments 37 pages, 7 figures

详情
AI中文摘要

我们首次系统全面地扫描了双味标量Froggatt-Nielsen (FN)模型,采用人工智能技术探索高维、混合离散-连续参数空间。将标准单味标量FN框架扩展到双味标量设置,其中不同的标量场独立耦合到上型和下型扇区,我们证明了它们的真空期望值之间的相对相位提供了单味标量模型所缺乏的CP破坏的自然且通用的来源。为了探索这个扩大的模型空间,我们将寻找唯象可行模型的问题转化为多目标优化问题,将每个实验约束作为一个独立目标,并采用非支配排序遗传算法III同时拟合所有18个FN电荷、45个Wilson系数和标量参数到夸克和轻子扇区。我们的方法不需要单独的训练阶段,并且比先前的强化学习方法快数个数量级地识别出唯象可行模型。施加对CKM和PMNS混合角及CP相位、带电费米子质量以及中微子质量平方差的实验约束,我们发现了超过10万个独特的可行模型,且重复率极低,表明有效的双味标量FN实现空间尚未被穷尽。正常和倒置中微子质量平方排序均被实现,标量真空期望值的相对层次对无中微子双贝塔衰变有效质量$m_{ee}$产生了性质不同的预测。我们进一步证明了存在最大标量指数小至3的最小FN实现,以及无需任何专门的连续参数优化就能在6%以内重现带电费米子质量的模型。

英文摘要

We present the first systematic and comprehensive scan of two-flavon Froggatt-Nielsen (FN) models, employing artificial intelligence techniques to explore the high-dimensional, mixed discrete-continuous parameter space. Extending the standard single-flavon FN framework to a two-flavon setup in which separate flavon fields couple independently to the up- and down-type sectors, we demonstrate that the relative phase between their vacuum expectation values (vevs) provides a natural and generic source of CP violation absent in single-flavon models. To explore this enlarged model space, we cast the search for phenomenologically viable models as a multi-objective optimisation problem, formulating each experimental constraint as a separate objective, and employ the Non-dominated Sorting Genetic Algorithm III to simultaneously fit all 18 FN charges, 45 Wilson coefficients, and flavon parameters to both the quark and lepton sectors. Our approach requires no separate training phase and identifies phenomenologically viable models orders of magnitude faster than prior reinforcement learning methods. Imposing experimental constraints on CKM and PMNS mixing angles and CP phases, charged fermion masses, and neutrino squared-mass differences, we discover over $100\,000$ unique viable models with a remarkably low duplication rate, indicating that the space of valid two-flavon FN realisations has not been exhausted. Both Normal and Inverted neutrino mass squared orderings are realised, with the relative hierarchy between the flavon vevs producing qualitatively distinct predictions for the effective neutrinoless double beta decay mass $m_{ee}$. We further demonstrate the existence of minimal FN realisations with maximal flavon exponent as small as three, and of models reproducing charged fermion masses to within $6\%$ without any dedicated continuous parameter optimisation.

2606.19737 2026-06-19 stat.ME stat.ML 新提交 85%

Calibration without labels in multiple testing

多重检验中的无标签校准

Adway S. Wadekar, Jake A. Soloff

专题命中 其他科学智能 :提出多重检验无标签校准方法,应用于统计和神经科学

AI总结 针对多重检验中无法观测真实标签的难题,利用有序p值间距构造伪标签,实现局部错误发现率的校准,并揭示q值在心理学和神经科学文献中可能严重失准。

详情
AI中文摘要

大规模假设检验支持对单个假设的概率性声明,如经验贝叶斯方法估计局部错误发现率。我们研究如何将这些声明解释为原假设的近似校准预测,即使在模型误设定下也能产生可解释的错误概率。我们的方法从概率预测中汲取概念灵感,但面临不同的挑战:与预测不同(标签最终可观测),在多重检验中真实情况从未揭示,因此校准必须随机评估并间接建立。我们通过构造一组伪标签来应对这一挑战,这些伪标签源自有序$p$值的间距,并以局部错误发现率作为回归目标。我们的构造解锁了现有工具,用于评估和执行多重检验中的事后校准。值得注意的是,我们在对已发表的心理学和神经科学文献的大规模实证调查中发现,基于错误发现率的流行误差度量$q$值可能严重失准。

英文摘要

Large-scale hypothesis testing supports probability claims about individual hypotheses, as in empirical Bayes methods for estimating local false discovery rates. We study how such claims can be interpreted as approximately calibrated forecasts of the null hypothesis, yielding interpretable error probabilities even under model misspecification. Our approach draws conceptual inspiration from probabilistic forecasting but addresses a different challenge: unlike forecasting, where labels are eventually observed, in multiple testing the ground truth is never revealed, so calibration must be assessed stochastically and established indirectly. We address this challenge by constructing a set of pseudo-labels, derived from the spacings of ordered $p$-values, which have the local false discovery rate as their regression target. Our construction unlocks existing tools for assessing and performing post-hoc calibration in multiple testing. Notably, we find on a large-scale empirical survey of published psychology and neuroscience literature that the $q$-value, a popular error measure based on the false discovery rate, can be severely miscalibrated.

2606.19762 2026-06-19 q-bio.MN 新提交 85%

Oscillations and Spatial Patterns in Large-Scale Stochastic Gene Regulatory Networks

大规模随机基因调控网络中的振荡与空间模式

Manuel Eduardo Hernández-García, Jorge Velázquez-Castro

专题命中 其他科学智能 :分析基因调控网络振荡与空间模式,数学建模

AI总结 研究负反馈与扩散的循环基因调控网络,通过确定性和随机方法分析其稳定性,发现随机波动可诱导图灵失稳,为理解发育中的模式形成提供新视角。

Comments 16 pages, 10 figures

详情
AI中文摘要

基因调控网络(GRNs)是细胞生长和组织形成的基础,在发育过程中协调基因表达的时空调控。这些网络固有地受到分子噪声引起的内在波动的影响,因此分析其稳定性对于理解生物体稳健的模式形成和发育动力学至关重要。在本研究中,我们分析了具有负反馈和扩散的循环GRNs的稳定性和动力学,考虑了确定性和随机方法。在确定性情况下,系统表现出稳定性与不稳定性之间的分岔,导致无扩散时的Hopf失稳和包含扩散时的Turing-Hopf失稳。观察到空间域的离散化引入了额外的不稳定模式,从而允许更广泛的模式。基于二阶矩方法的随机框架包含了内在波动,揭示了对于小系统尺寸,即使系统在无扩散时是稳定的,波动也可以主导动力学并诱导随机Turing失稳。值得注意的是,即使所有变量具有相同的扩散速率,Turing失稳也可能出现。所开发的框架提供了一种系统的方法来分析具有扩散的高维随机系统的稳定性,从而简化了Turing和Turing-Hopf失稳的预测。这些发现有助于更深入地理解GRNs中的复杂动力学和模式形成,对细胞分化和发育等生物过程具有潜在意义。

英文摘要

Gene regulatory networks (GRNs) are fundamental to cellular growth and tissue formation, orchestrating spatially and temporally regulated gene expression during development. These networks are inherently subject to intrinsic fluctuations arising from molecular noise, making the analysis of their stability essential for understanding robust pattern formation and developmental dynamics of the organism. In this study, we analyze the stability and dynamics of cyclic GRNs with negative feedback and diffusion, considering both deterministic and stochastic approaches. In the deterministic case, the system exhibits a bifurcation between stability and instability, leading to Hopf instability in the absence of diffusion and to Turing-Hopf instability when diffusion is included. It was observed that the discretization of the spatial domain introduces additional unstable modes, enabling a wider range of patterns. The stochastic framework based on the second-moment approach, which incorporates intrinsic fluctuations, reveals that for small system sizes, fluctuations can dominate the dynamics and induce stochastic Turing instability, even when the system is stable in the absence of diffusion. Notably, Turing instabilities can emerge even when all variables have the same diffusion rate. The developed framework provides a systematic method for analyzing the stability of high-dimensional stochastic systems with diffusion, thereby simplifying the prediction of Turing and Turing-Hopf instabilities. These findings contribute to a deeper understanding of the complex dynamics and pattern formation in GRNs, with potential implications for biological processes, such as cellular differentiation and development.

2606.19396 2026-06-19 q-bio.QM 新提交 85%

BioHarness: Substrate-Aware Evidence Assembly for Biomedical Question Answering across Literature, Knowledge Bases, and Biological Atlases

BioHarness:面向生物医学问答的底物感知证据组装——跨文献、知识库和生物图谱

Meng Xiao, Chuan Qin, Jinmiao Chen, Yihang Cheng, Yuanchun Zhou, Hengshu Zhu

专题命中 其他科学智能 :面向生物医学问答的检索增强生成系统

AI总结 提出BioHarness,通过级联控制机制在文献检索、知识库和生物图谱间选择性组装证据,提升生物医学问答准确率,在19,302个问答项上得分从65.9提升至71.0。

Comments 14 Pages, 11 Figures, Keywords: biomedical question answering; retrieval-augmented generation; large language models; evidence assembly; biomedical knowledge bases; biological atlases

详情
AI中文摘要

动机:生物医学问答通常需要超越主题检索文献的证据,包括基因别名解析、数据库标识符标准化以及来自图谱的生物测量值。然而,现有的检索增强生成(RAG)系统通常遵循固定工作流程,缺乏明确机制来决定何时检索文本足够、何时需要经过整理的生物医学知识、或何时应调用对结构化测量值的可执行证据组装。这激发了一种底物感知的大语言模型(LLM)框架,能够跨文献、知识库和生物图谱选择性地组装足够的证据。结果:我们引入BioHarness,一种用于分阶段生物医学证据组装的LLM框架,涵盖文献检索、经过整理的生物医学知识资源以及来自图谱的结构化测量值。BioHarness首先尝试根据重排序的文献证据回答问题,并通过基于接地级联控制,仅在当前证据不确定、接地不足或底物不匹配时升级到REPL风格的证据组装。在涵盖七种答案格式的19,302个生物医学问答项上,BioHarness将最强非预言基线的综合得分从65.9提升至71.0。消融实验、案例研究和骨干扩展分析表明,这些提升源于通过重排序、实体接地和结构化测量访问修复证据-底物不匹配,而非不加区分地调用更多推理步骤、检索更多文献或依赖特定答案模型规模。

英文摘要

Motivation: Biomedical question answering often requires evidence beyond topically retrieved literature, including gene alias resolution, database identifier normalization, and atlas-derived biological measurements. However, existing retrieval-augmented generation (RAG) systems typically follow a fixed workflow and lack an explicit mechanism for deciding when retrieved text is sufficient, when curated biomedical knowledge is required, or when executable evidence assembly over structured measurements should be invoked. This motivates a substrate-aware large language model (LLM) harness that selectively assembles sufficient evidence across literature, knowledge bases, and biological atlases. Results: We introduce BioHarness, an LLM harness for staged biomedical evidence assembly across literature retrieval, curated biomedical knowledge resources, and atlas-derived structured measurements. BioHarness first attempts to answer from reranked literature evidence and escalates through grounded cascade control to REPL-style evidence assembly only when the current evidence is uncertain, weakly grounded, or substrate-mismatched. Across 19,302 biomedical QA items spanning seven answer formats, BioHarness improves the pooled score from 65.9 to 71.0 over the strongest non-oracle baseline. Ablations, case studies, and backbone-scaling analyses show that these gains arise from repairing evidence-substrate mismatches through reranking, entity grounding, and structured measurement access, rather than from indiscriminately invoking more reasoning steps, retrieving additional literature, or relying on a particular answer-model scale.

2606.20451 2026-06-19 stat.ML cs.LG stat.AP stat.CO 新提交 85%

SSH-Net: A Deep Neural Network for Predicting Failure Time Distribution Functions under Competing Risks with Application to GPU Data

SSH-Net: 一种用于竞争风险下预测失效时间分布函数的深度神经网络及其在GPU数据上的应用

Jie Min, Yueyao Wang, Mengkun Chen

发表机构 * Department of Mathematics & Statistics, University of South Florida(佛罗里达州立大学数学与统计学系) School of Statistics and Data Science, Zhejiang Gongshang University(浙江工商大学统计与数据科学学院) Department of Statistics, Virginia Tech(弗吉尼亚理工学院统计学系)

专题命中 其他科学智能 :提出深度神经网络预测失效时间,应用于GPU数据,属于科学智能

AI总结 提出结构化分段风险深度神经网络(SSH-Net),通过将网络结构与数据结构关联,允许不同协变量组通过子网络影响预测,在竞争风险框架下预测失效时间分布函数,仿真和GPU数据验证了准确性。

详情
AI中文摘要

竞争风险在工程领域常见,当应用场景复杂时会给时间事件数据建模带来挑战。近年来,深度神经网络因其灵活性和高学习能力在竞争风险预测中受到广泛关注。然而,神经网络结构的复杂性使得基于不同数据输入的超参数调优更加困难。此外,当工程系统具有多层级的复杂物理结构时,将所有结构层级视为单一输入组可能无法捕捉关键信息。为解决这些问题,我们提出了一种结构化分段风险深度神经网络(SSH-Net),用于在特定原因竞争风险框架下预测失效时间。我们的方法将神经网络结构与数据结构相关联,并允许不同的协变量组通过分离的子网络影响失效预测。神经网络基于特定原因竞争风险模型构建。SSH-Net输出特定原因风险函数,并采用惩罚对数似然作为损失函数。通过评估Brier分数、接收者操作特征曲线下面积(AUC)和预测的特定原因累积发生函数的均方根误差(RMSE),仿真研究验证了SSH-Net的预测准确性。我们进一步使用Titan GPU失效时间数据展示了模型预测失效时间分布函数的能力。

英文摘要

Competing risks are commonly observed in engineering fields and can bring challenges to time-to-event data modeling when the application scenarios are complicated. Recently, deep neural networks have received great attention for prediction with competing risks, due to their flexibility and high learning capability. However, the complexity of neural network structure brings extra difficulty in hyperparameter tuning based on different data inputs. Additionally, when an engineered system has complex physical structures with multiple hierarchical levels, treating all structural levels as a single group of inputs may fail to capture critical information. To address the issues, we propose a Structured Segmented Hazard Deep Neural Network (SSH-Net) for failure time prediction under cause-specific competing risks framework. Our approach associates neural network structure with data structures, and allows different covariate groups to impact the failure prediction through separate sub-networks. The neural network is constructed based on a cause-specific competing risks model. The SSH-Net outputs cause-specific hazard functions, and utilizes the penalized log-likelihood as the loss function. The prediction accuracy of SSH-Net is validated through simulation studies by evaluating the Brier score, the area under receiver operating characteristic curves (AUC), and the root mean square error (RMSE) of the predicted cause-specific cumulative incident function. We further demonstrate the model's ability to predict failure time distribution functions using the Titan GPU failure time data.

2606.19643 2026-06-19 stat.ML cs.LG 新提交 85%

Variational Consensus Monte Carlo for Bayesian Mixture

变分共识蒙特卡洛用于贝叶斯混合模型

Julie Fendler, Francesca L. Crowe, Tom Marshall, Sylvia Richardson, Paul D. W. Kirk

发表机构 * MRC Biostatistics Unit, University of Cambridge(剑桥大学生物统计学单位) Institute of Applied Health Research, University of Birmingham(伯明翰大学应用健康研究学院)

专题命中 其他科学智能 :提出贝叶斯混合模型用于联邦学习,在电子健康记录数据上验证

AI总结 提出变分共识蒙特卡洛方法扩展至过拟合贝叶斯混合模型,通过新颖的聚类匹配算法和聚合策略,在联邦学习设置下推断聚类数和所有参数,并在模拟和真实电子健康记录数据上验证了有效性。

详情
AI中文摘要

受健康数据的隐私、敏感性和共享限制的驱动,我们提出了一个在联邦学习设置下(即数据无法在计算节点之间完全共享或汇集)对贝叶斯混合模型进行推断的全面流程。我们采用共识蒙特卡洛(CMC)方法,在每个数据孤岛内独立运行MCMC算法以估计局部后验分布,然后聚合这些分布以近似完整数据的后验。Rabinovich, Angelino 和 Jordan (2015) [1] 的变分CMC方法将聚合步骤视为变分推断问题,但他们应用于混合模型时假设聚类数和关键混合参数已知。我们的主要方法贡献是:(i) 将变分CMC扩展到过拟合贝叶斯混合模型,该模型推断聚类数和所有模型参数,无需共轭性;(ii) 适用于跨孤岛设置的新颖聚类匹配算法,其中并非每个聚类都出现在每个局部数据集中;(iii) 针对聚合步骤的多种推断策略,匹配不同的联邦学习约束;以及 (iv) 在实践中选择这些策略的指南。一项全面的模拟研究验证了该框架,并允许我们与最先进的联邦学习替代方法进行比较。值得注意的是,我们表明当局部数据集的组成反映了数据中的底层聚类结构时,我们的方法可以比应用于汇集数据的标准MCMC更准确地恢复小聚类。我们在大规模电子健康记录数据上展示了该框架,识别了英国老年人群中的多发病模式。

英文摘要

Motivated by the privacy, sensitivity and sharing limitations of health data, we present a comprehensive pipeline for inference of Bayesian mixture models within a federated learning setting, i.e. when data cannot be fully shared or pooled across compute nodes. We adopt a Consensus Monte Carlo (CMC) approach, in which an MCMC algorithm is run independently within each data silo to estimate local posterior distributions, which are then aggregated to approximate the posterior over the full data. The variational CMC approach of Rabinovich, Angelino and Jordan (2015) [1] frames the aggregation step as a variational inference problem, but their application to mixtures assumes the number of clusters and key mixture parameters to be known. Our main methodological contributions are: (i) an extension of variational CMC to over-fitted Bayesian mixture models that infer the number of clusters and all model parameters, without requiring conjugacy; (ii) novel cluster-matching algorithms suitable for cross-silo settings in which not every cluster appears in each local dataset; (iii) a number of inference strategies for the aggregation step, matched to different federated learning constraints; and (iv) guidelines for choosing among these in practice. A comprehensive simulation study validates the framework and allows us to compare to state-of-the-art federated learning alternatives. Notably, we show that when the composition of local datasets reflects the underlying clustering structure in the data, our approach can recover small clusters with greater accuracy than standard MCMC applied to the pooled data. We illustrate the framework on large-scale electronic health record data, identifying multi-morbidity patterns in a British geriatric population.

2606.20480 2026-06-19 math.ST stat.ML stat.TH 新提交 85%

Leveraging tails for adaptation

利用尾部进行自适应

Sergios Agapiou, Ismaël Castillo, Paul Egels

专题命中 其他科学智能 :研究非参数贝叶斯后验收缩率,应用于白噪声回归和ReLU神经网络

AI总结 研究非参数贝叶斯中基于p-指数尾先验的后验收缩率,发现p越小收缩越快,且p→0时可实现光滑性自适应,应用于白噪声回归和ReLU神经网络。

Comments 59 pages, 3 figures

详情
AI中文摘要

我们考虑非参数设定下贝叶斯后验分布的收缩,其中函数在基或字典上的系数被赋予具有$p$指数尾的先验,包括拉普拉斯尾$(p=1)$和更重的尾$(p<1)$。结果表明,随着$p$减小,收缩率提高,并且在适当的$p\to 0$范围内,可以获得对光滑性的完全自适应(达到对数因子)。作为应用,我们考虑了白噪声回归中的级数先验和随机设计回归中的浅层ReLU神经网络。特别地,我们表明过参数化的浅层ReLU网络可以适应任何正则性$0\le \beta\le 2$。通过模拟研究,我们展示了与理论预测行为的高度实证一致性。

英文摘要

We consider contraction of Bayesian posterior distributions in nonparametric settings where coefficients of a function over a basis or dictionary are given priors with $p$--exponential tails, including Laplace tails $(p=1)$ and heavier tails $(p<1)$. It is shown that contraction rates improve as $p$ decreases and that full adaptation to smoothness, up to logarithmic factors, is obtained in an appropriate $p\to 0$ regime. As applications, we consider both series priors in white noise regression and shallow ReLU neural networks in random design regression. In particular, we show that overparametrised shallow ReLU networks can adapt to any regularity $0\le β\le 2$. Through a simulation study, we show strong empirical agreement with the behavior predicted by our theory.

2606.19524 2026-06-19 physics.ed-ph hep-ph 新提交 85%

Vistas: A Visualization Interface for Particle Collision Simulations

Vistas:粒子碰撞模拟的可视化界面

Benoit Assi, Christan Bierlich, Rikab Gambhir, Philip Ilten, Tony Menzo, Stephen Mrenna, Manuel Szewc, Michael K. Wilkinson, Ahmed Youssef, Jure Zupan

专题命中 其他科学智能 :可视化粒子碰撞模拟,用于物理教育

AI总结 提出Vistas工具,利用浏览器事件显示框架Phoenix可视化Pythia模拟的高能粒子碰撞各阶段,通过交互式3D图结构展示粒子,支持旋转、缩放和筛选,适用于物理教育。

Comments 20 pages, 9 figures, public code available

详情
AI中文摘要

我们介绍Vistas,一个用于可视化由Pythia蒙特卡洛事件生成器模拟的高能粒子物理碰撞的工具。Vistas利用基于浏览器的事件显示框架Phoenix,展示高能碰撞事件模拟的不同计算阶段:硬过程、部分子簇射、强子化和粒子衰变。每个阶段产生的粒子被表示为交互式三维图结构中的线条,每条线沿其粒子三维动量矢量的方向。事件可以旋转、平移和缩放,通过选择相关粒子线可以访问每个粒子的详细信息。此外,模拟所有阶段的粒子线可以切换开关,并可以通过粒子级运动学选择要求进行过滤。这种交互式环境提供了对Pythia模拟输出的直观解释,包括颜色流、束流残余和多重部分子相互作用等详细特征,使其成为物理教育环境中的有用工具,从外展活动到研究生粒子物理课程。

英文摘要

We introduce Vistas, a tool for visualizing high-energy particle physics collisions simulated by the Pythia Monte-Carlo event generator. Vistas utilizes the browser-based event display framework Phoenix to show distinct computational stages of a high-energy collision event simulation: the hard process, parton shower, hadronization, and particle decays. Particles produced from each of these stages are represented as lines in an interactive three-dimensional graph structure, where each line is along the direction of its particle's three-momentum vector. The event can be rotated, translated and zoomed, and details for each particle can be accessed by selecting the relevant particle line. Additionally, particle lines from all stages of the simulation can be toggled on and off and can be filtered by particle-level kinematic selection requirements. This interactive environment provides an intuitive interpretation of Pythia simulation output, including detailed features such as color flow, beam remnants, and multiple parton interactions, making it a useful tool in physics education settings, from outreach activities to graduate particle-physics courses.

2606.20437 2026-06-19 hep-ex cs.LG 新提交 85%

HEPTv2: End-to-End Efficient Point Transformer for Charged Particle Reconstruction

HEPTv2:用于带电粒子重建的端到端高效点变换器

Siqi Miao, Shitij Govil, Jack P. Rodgers, Mia Liu, Javier Duarte, Shih-Chieh Hsu, Yuan-Tang Chou, Pan Li

发表机构 * School of Electrical and Computer Engineering, Georgia Institute of Technology(佐治亚理工学院电气与计算机工程学院) Department of Physics and Astronomy, Purdue University(普渡大学物理与天文学系) Department of Physics, University of California San Diego(加州大学圣地亚哥分校物理系) Department of Physics, University of Washington(华盛顿大学物理系)

专题命中 其他科学智能 :点变换器用于粒子物理轨迹重建

AI总结 提出HEPTv2,一种端到端点变换器架构,通过局部敏感哈希编码和扇区化解码,无需图构建即可从探测器击中点直接重建粒子轨迹,在TrackML上以0.8%假率实现98.6%追踪效率,延迟仅15ms。

详情
AI中文摘要

带电粒子追踪——从稀疏探测器测量中重建轨迹——是一个基础的高能物理推理问题,也是在极端组合歧义下学习的典型例子。在高亮度大型强子对撞机(HL-LHC)上,尽管碰撞密度前所未有,追踪必须保持准确和高效。图神经网络表现强劲,但图构建和处理带来了大量成本,而基于变换器的方法依赖辅助阶段,阻碍了端到端优化。为解决这一问题,我们提出了HEPTv2,一种端到端点变换器架构,在一个可训练管道中从探测器击中点重建轨迹。HEPTv2结合了局部感知点编码器和轨迹解码器,无需图构建、聚类或过滤即可预测完整轨迹。编码器在探测器坐标空间中使用局部敏感哈希,以保留追踪相关几何结构,同时实现高效的局部注意力。解码器通过扇区化解码和联合编码器-解码器监督下的直接击中到轨迹预测来消除歧义,使整个管道能够端到端优化。在TrackML上,HEPTv2以0.8%的假率实现了98.6%的双多数追踪效率,同时在NVIDIA A100 GPU上每个事件仅需约15毫秒推理时间和0.4 GB峰值内存。对于最多包含$5\ imes10^5$个击中点的事件,延迟和内存大致线性扩展。HEPTv2在精度-延迟权衡中建立了新的最先进水平,相比之前最强的变换器效率提升4.5%,相比优化的基于图管道提升1.1-2.2%,同时延迟分别降低7倍和38-52倍。这些结果表明,端到端变换器能够提供HL-LHC实时粒子重建所需的精度和效率。

英文摘要

Charged-particle tracking -- reconstructing trajectories from sparse detector measurements -- is a fundamental high-energy-physics inference problem and a canonical example of learning under extreme combinatorial ambiguity. At the High-Luminosity Large Hadron Collider (HL-LHC), tracking must remain accurate and efficient despite unprecedented collision densities. Graph neural networks perform strongly, but incur substantial costs from graph construction and processing, while transformer-based approaches rely on auxiliary stages that prevent end-to-end optimization. To address this, we present HEPTv2, an end-to-end point-transformer architecture that reconstructs tracks from detector hits in one trainable pipeline. HEPTv2 combines a locality-aware point encoder with a track decoder that predicts complete trajectories without graph-building, clustering, or filtering. The encoder uses locality-sensitive hashing in detector coordinate space to preserve tracking-relevant geometry while enabling efficient local attention. The decoder resolves ambiguities through sectorized decoding and direct hit-to-track prediction under joint encoder-decoder supervision, allowing the full pipeline to be optimized end-to-end. On TrackML, HEPTv2 achieves 98.6% double-majority tracking efficiency at a 0.8% fake rate, while requiring only $\sim$15~ms inference time and 0.4~GB peak memory per event on a NVIDIA A100 GPU. Latency and memory scale approximately linearly for events with up to $5\times10^5$ hits. HEPTv2 establishes a new state of the art in the accuracy-latency trade-off, improving efficiency by 4.5% over the strongest prior transformer and by 1.1--2.2% over optimized graph-based pipelines, while reducing latency by factors of 7 and 38--52, respectively. These results show end-to-end transformers can deliver the accuracy and efficiency required for real-time particle reconstruction at the HL-LHC.

2606.20191 2026-06-19 stat.ML stat.ME 新提交 80%

AK-MCS-C2 : Active Kriging Monte Carlo Simulation method with conformal certification for failure probability estimation

AK-MCS-C2: 具有共形认证的主动克里金蒙特卡洛模拟方法用于失效概率估计

Edgar Jaber, Vincent Chabridon, Mathilde Mougeot

专题命中 其他科学智能 :主动学习框架用于结构可靠性失效概率估计

AI总结 提出一种结合主动克里金蒙特卡洛模拟与共形预测的主动学习框架,通过自适应交叉共形策略和J+GP共形估计器,在少量样本下提供无分布假设的预测误差保证,提高极限状态面附近样本分类可靠性,从而提升失效概率估计的准确性和鲁棒性。

详情
AI中文摘要

我们提出了一种新颖的主动学习框架,用于结构可靠性分析中的失效概率估计,该框架将主动克里金蒙特卡洛模拟与共形预测相结合。所提出的方法采用了一种自适应交叉共形策略,专门针对小样本设置和基于J+GP共形估计器的克里金代理模型设计。与标准的AK-MCS方法不同,所提出的框架对预测误差提供了无分布假设的保证,从而对极限状态面附近的样本进行更可靠的分类。这种改进的不确定性量化增强了失效概率估计的准确性和鲁棒性,特别是在这种效率至关重要的罕见事件区域。可重复的数值结果说明了该方法的有效性,并在公认的基准测试上将其与经典方法进行了比较。

英文摘要

We introduce a novel active-learning framework for failure probability estimation in structural reliability analysis that integrates Active Kriging Monte Carlo simulation with conformal prediction. The proposed approach employs an adaptive cross-conformal strategy specifically designed for small-sample settings and kriging surrogate models using the J+GP conformal estimator. Unlike standard AK-MCS methods, the proposed framework provides distribution-free guarantees on prediction errors, leading to more reliable classification of samples near the limit-state surface. This improved uncertainty quantification enhances both the accuracy and robustness of failure probability estimates, especially for rare-event regimes where such efficiency is crucial. Reproducible numerical results illustrate the effectiveness of the method and also compare it to classical approaches on well-established benchmarks.

2606.19540 2026-06-19 stat.ME stat.CO stat.ML 新提交 80%

Overfitted high-dimensional matrix factorizations via adaptive spectral shrinkage

通过自适应谱收缩的过拟合高维矩阵分解

Lorenzo Mauri, David B. Dunson

专题命中 其他科学智能 :提出EigenBayes方法用于高维因子模型,应用基因组学

AI总结 提出EigenBayes方法,通过谱估计和自适应经验贝叶斯校准超参数,实现快速且具有不确定性量化的过拟合因子模型,在数值实验和基因组学应用中优于现有方法。

详情
AI中文摘要

因子模型是分析高维数据以提取低秩信号和估计协方差的常用方法。它们将协方差矩阵分解为低秩分量和对角分量之和。一个关键问题是如何选择潜在维度$k$,当因子模型仅近似成立且信噪比较低时,这尤其具有挑战性。贝叶斯过拟合因子模型指定$k$的上界,并依赖结构化收缩先验有效去除多余分量。这类方法流行且有效,但计算成本高。我们提出了一种更快的\texttt{EigenBayes}方法,基于潜在因子的谱估计和关键超参数的自适应经验贝叶斯校准,提供有效的不确定性量化。得到的后验分布可跨结果分解且解析可处理,绕过了马尔可夫链蒙特卡洛。我们证明\texttt{EigenBayes}能适应每个结果和潜在维度的信噪比,同时将多余的潜在分量收缩至零。我们建立了良好的渐近性质,并在数值实验和基因组学应用中展示了强大的实证性能,其中EigenBayes优于最先进的替代方法。

英文摘要

Factor models are popular approaches for analyzing high-dimensional data to extract low-rank signals and estimate covariances. They decompose the covariance matrix as the sum of low-rank and diagonal components. A key issue is how to choose the latent dimension $k$, which is particularly challenging when the factor model only holds approximately and in low signal-to-noise scenarios. Bayesian overfitted factor models specify an upper bound on $k$ and rely on structured shrinkage priors to effectively remove extra components. Such approaches are popular and effective, but computationally expensive. We propose a much faster \texttt{EigenBayes} approach that provides valid uncertainty quantification, based on spectral estimation of latent factors and adaptive empirical Bayes calibration of key hyperparameters. The resulting posterior distribution factorizes across outcomes and is analytically tractable, bypassing Markov chain Monte Carlo. We show that \texttt{EigenBayes} adapts to the signal-to-noise ratio of each outcome and latent dimension, while shrinking superfluous latent components to zero. We establish favorable asymptotic properties and demonstrate strong empirical performance in numerical experiments and a genomics application, where EigenBayes outperforms state-of-the-art alternatives.

2606.19739 2026-06-19 q-bio.NC 新提交 80%

Robust probabilistic measurement of structural-functional module consistency in infant brain development

婴儿大脑发育中结构-功能模块一致性的鲁棒概率测量

Lingbin Bian, Feihong Liu, Qian Wang, Han Zhang, Dinggang Shen, the UNC/UMN Baby Connectome Project Consortium

专题命中 其他科学智能 :婴儿脑网络结构-功能一致性概率测量方法

AI总结 提出基于随机模块的概率方法,鲁棒测量婴儿大脑结构-功能模块一致性,发现0-5岁间一致性下降,初级脑区一致性更高。

详情
AI中文摘要

脑网络通常被划分为模块,用于分析其在神经影像学研究的群体分析中功能分离的角色。这里,我们引入脑网络中的随机模块,用于在受试者群体中对结构-功能模块一致性(SFMC)进行鲁棒的概率测量。具体而言,随机模块可被视为一个脑区在受试者间可能被分配到群体级子网络的机会,其特征为该脑区的分配概率。这种新方法在评估脑网络中的非均匀模块方面有两个优势。首先,它可以鲁棒地评估脑结构模块与功能模块之间的一致性,而两者的群体规模不必相同;其次,它能够考虑群体中模块的个体间变异性。此外,与传统的结构-功能耦合方法相比,我们的基于随机模块的方法揭示了结构与功能之间耦合的更显著下降,表明更强的发育重组。我们使用婴儿连接组项目(BCP)数据集的结果显示,SFMC在0至5岁期间下降,并且在初级脑区(如视觉区域)较高,而在更高级的认知区域(包括与注意力、控制和默认模式网络相关的区域)较低。

英文摘要

Brain network is commonly divided into modules for analyzing their functionally segregated roles for group-level analysis in neuroimaging studies. Here, we introduce stochastic modules within brain networks for a robust probabilistic measurement of structural-functional module consistency (SFMC) in a group of subjects. Specifically, a stochastic module can be regarded as the chance of a brain region across subjects potentially being assigned to a group-level sub-network, characterized as an assignment probability for this brain region. This novel method has two advantages for evaluating inhomogeneous modules in brain networks. The first is that it can robustly evaluate the consistency between brain structural and functional modules whose population sizes are not necessary the same, and the second is that it is able to take into account the inter-individual variability of the modules for the groups. Moreover, compared with the conventional structural-functional coupling approach, our stochastic module-based method reveals a more pronounced decline in the coupling between structure and function, indicating stronger developmental reorganization. Our results using the dataset from Baby Connectome Project (BCP) show that the SFMC decreases from 0 to 5 years old, and is greater in primary brain regions, such as visual areas, while lower in more advanced cognitive regions, including those related to attention, control, and default mode network.

2606.19560 2026-06-19 cs.LG 新提交 80%

Understanding Key Features of Time Series Foundation Models from Epidemic Forecasting

从流行病预测理解时间序列基础模型的关键特征

Alireza Jafari, Judy Fox, Geoffrey C. Fox, Madhav Marathe, Aniruddha Adiga

发表机构 * Department of Computer Science, School of Engineering and Applied Science, University of Virginia(弗吉尼亚大学工程与应用科学学院计算机科学系) School of Data Science, University of Virginia(弗吉尼亚大学数据科学学院) Biocomplexity Institute, University of Virginia(弗吉尼亚大学生物复杂性研究所) Department of Electrical and Computer Engineering, School of Engineering and Applied Science, University of Virginia(弗吉尼亚大学工程与应用科学学院电气与计算机工程系)

专题命中 其他科学智能 :评估时间序列模型用于流行病预测,属于科学智能

AI总结 系统评估多种时间序列模型在流感预测中的表现,发现混合专家模型性能最优,预训练在长时域提升显著,而LLM方法效果较差。

Comments 15 pages, 2 figures, 9 tables

详情
AI中文摘要

季节性流感每年感染数百万人,并在美国造成大量发病和死亡,因此准确的短期预测成为核心公共卫生需求。可靠的流行病时间序列预测可以为疫苗接种时机、医院人员配备和资源分配提供信息,然而现代预测架构在传染病监测数据上的比较行为仍未得到充分表征。我们通过系统评估区域流感预测来填补这一空白,使用流感样疾病监测和流感相关住院时间序列,在时间泛化和空间泛化设置下进行1-4周提前预测。我们比较了经典神经网络架构、基于数值的Transformer模型、预训练时间序列基础模型和基于LLM的预测方法。在各项任务中,我们证明融合多个预训练预测器的混合专家模型实现了最强的整体性能,表明异质预训练表示提供了互补的预测信息。我们的结果进一步表明,基于数值的Transformer模型产生可靠的预测,而预训练在更长时域上提供最大增益,特别是当预训练领域与流感动力学机制一致时。相比之下,基于LLM的时间序列方法在此设置下表现不如数值预测器。最后,我们研究了住院信息作为辅助协变量和预训练源的作用。住院信号在特定设置中提供了互补的改进,并阐明了额外的监测流如何增强多时域预测的鲁棒性。这些发现为流感防范的模型选择、预训练策略和辅助信号使用提供了可操作的指导。

英文摘要

Seasonal influenza infects millions of people and causes substantial morbidity and mortality in the United States each year, making accurate short-term forecasting a core public-health need. Reliable forecasts of epidemic time series can inform vaccination timing, hospital staffing, and resource allocation, yet the comparative behavior of modern forecasting architectures on infectious-disease surveillance data remains insufficiently characterized. We address this gap through a systematic evaluation of regional influenza forecasting using influenza-like illness surveillance and influenza-associated hospitalization time series under both temporal and spatial generalization settings for 1-4-week-ahead prediction. We compare classical neural network architectures, numerical transformer-based models, pretrained time series foundation models, and LLM-based forecasting approaches. Across tasks, we demonstrate that a mixture-of-experts model that fuses multiple pretrained forecasters achieves the strongest overall performance, indicating that heterogeneous pretrained representations provide complementary predictive information. Our results further show that numerical transformer-based models produce reliable forecasts, while pretraining provides the largest gains at longer horizons, particularly when the pretraining domain is mechanistically aligned with influenza dynamics. In contrast, LLM-based time series methods underperform relative to numerical forecasters in this setting. Finally, we examine hospitalization information as both an auxiliary covariate and a pretraining source. Hospitalization signals provide complementary improvements in selected settings and clarify when additional surveillance streams enhance the robustness of multi-horizon forecasting. These findings provide actionable guidance on model selection, pretraining strategy, and auxiliary-signal use for influenza preparedness.

2606.19761 2026-06-19 cs.LO math.LO 新提交 80%

Finishing Oltean's Completeness Proof in Lean 4 for Hybrid Logic $L(\forall)$

在 Lean 4 中完成 Oltean 关于混合逻辑 $L(\forall)$ 的完备性证明

Lars Warren Ericson

专题命中 其他科学智能 :在Lean4中完成混合逻辑完备性证明

AI总结 本文在 Lean 4 中完成了混合逻辑 $L(\forall)$ 的机器检查完备性证明,通过结构新鲜性和存在引理 Henkin 构造两种工具解决了新鲜名称的生成问题。

Comments 147 pages, 5 figures

详情
AI中文摘要

我们给出了一个在 Lean 4 中机器检查的完备性定理,针对混合逻辑 $L(\forall)$:带有名义词、满足风格绑定器 $\forall$ 和盒子模态的命题模态逻辑。(基本混合逻辑(无绑定器)的机器检查完备性由 Asta Halkjær From 在 Isabelle/HOL 中开创。)我们基于 Alex Oltean 2023 年的 Lean 4 形式化工作,该工作机械化了语法、语义、希尔伯特风格证明系统和可靠性(遵循 Blackburn 的混合完备性(1998)),但留下了不完备的部分。完成它需要在两个结构不同的点上制造新鲜名称,我们的核心发现是它们需要两种不同的工具。(1)通过扩展的 Lindenbaum 构造构建的根可证最大一致集,每一步都需要一个对整个集合新鲜的名义词;正确的工具是结构新鲜性:扩展语言,使得通过构造保留无限的名义词供应。我们调查了设计空间(Oltean 在 $\mathbb{N}$ 内的奇偶编码、Bud Mishra 建议的不交和 $N \oplus \mathbb{N}$ 参数化,以及 From 的合成完备性框架)并解释了我们采用的编码。(2)一个最大一致集的可证 $\Diamond$-后继不能通过这种方式获得:其典范盒子归约可证地提及每个名义词,因此没有保留的名称是新鲜的。这里正确的工具是 Oltean 选择但未完成的:一个存在引理 Henkin 构造,通过一个新鲜状态变量从前驱的可证性中抽取每个见证;我们通过一个携带数据的见证累加器和一个紧致性论证完成了它。定理 $\Gamma \models \varphi \to \Gamma \vdash \varphi$ 被完全形式化:该开发是无 sorry 的,且 #print axioms 仅报告 propext、this http URL 和 this http URL。我们将开发移植到 Lean v4.30.0 / mathlib v4.30.0。

英文摘要

We present a machine-checked completeness theorem, in Lean 4, for the hybrid logic $L(\forall)$: propositional modal logic with nominals, the satisfaction-style binder $\forall$, and the box modality. (Machine-checked completeness for basic hybrid logic, without binders, was pioneered by Asta Halkjær From in Isabelle/HOL.) We build on Alex Oltean's 2023 Lean 4 formalization, which mechanized the syntax, semantics, Hilbert-style proof system, and soundness following Blackburn's Hybrid Completeness (1998), but left completeness unfinished. Finishing it requires manufacturing fresh names at two structurally different points, and our central finding is that they call for two different tools. (1) The root witnessed maximal consistent set, built by an extended Lindenbaum construction, needs at each step a nominal fresh for the whole set; the right tool is structural freshness: extend the language so an infinite supply of nominals is reserved by construction. We survey the design space (Oltean's odd/even encoding inside $\mathbb{N}$, the disjoint-sum $N \oplus \mathbb{N}$ parameterization suggested by Bud Mishra, and From's synthetic-completeness frameworks) and explain the encoding we adopt. (2) The witnessed $\Diamond$-successor of a maximal consistent set cannot be obtained this way: its canonical box-reduct provably mentions every nominal, so no reserved name is fresh. Here the right tool is one Oltean chose but left incomplete: an existence-lemma Henkin construction drawing each witness from the predecessor's witnessedness through a fresh state variable; we complete it with a data-carrying witness accumulator and a compactness argument. The theorem $Γ\models φ\to Γ\vdash φ$ is fully formalized: the development is sorry-free, and #print axioms reports only propext, Classical.choice, and Quot.sound. We port the development to Lean v4.30.0 / mathlib v4.30.0.

2606.19405 2026-06-19 q-bio.QM math.DS q-bio.PE 新提交 80%

Multi-type branching inference on contact trees with application to COVID-19

接触树上的多类型分支推断及其在COVID-19中的应用

Augustine Okolie, Johannes Müller, Eno Akarawakc, Isaac Ajiboye

专题命中 其他科学智能 :提出接触树上的多类型分支推断方法

AI总结 提出一种直接作用于接触树上传播树的似然框架,通过多类型分支过程考虑接触度异质性,从部分解析的传播树中推断流行病学参数,并在COVID-19接触追踪数据中验证。

Comments 26 pages, 8 Figures

详情
AI中文摘要

从传播树推断流行病学参数对于理解传染病动态至关重要。现有的基于树的似然方法,包括最初应用于系统动力学环境中的多类型出生-死亡模型,提供了强大的工具,但大多数假设均匀混合,很少捕捉当个体感染更多接触者时传播潜力的变化。在这项工作中,我们开发了一个直接作用于传播树的似然框架,其中节点是个体,边是报告的传播事件,不涉及序列数据。我们推导了一个在有根接触树上的随机SIR过程的似然,其中每个感染个体由有效接触总数和已感染的下游接触数来刻画。我们得到了一个分支完全未被观察到的概率以及它产生一个处于给定状态的观察(采样)末端的概率密度的闭式常微分方程。对于已知末端状态的有根接触树,可以评估得到的似然,并且我们通过将内部分支时间视为潜在变量,将其扩展到部分解析的树。在模拟爆发上的验证确认了准确的参数恢复和良好校准的不确定性。应用于印度卡纳塔克邦的经验COVID-19接触追踪数据,展示了该框架在实际流行病学环境中的实用性。通过在多类型分支似然中纳入接触度异质性,我们的工作为从完全或部分解析的传播树推断传播动态和接触结构提供了一个原则性的基线,补充而非依赖于基于序列的系统动力学推断。

英文摘要

Inferring epidemiological parameters from transmission trees is essential for understanding infectious disease dynamics. Existing tree-based likelihood methods, including the multi-type birth-death models originally applied in phylodynamic settings, provide powerful tools, but most assume homogeneous mixing and rarely capture how transmission potential changes as an individual infects more of their contacts. In this work, we develop a likelihood framework that operates directly on transmission trees, in which nodes are individuals and edges are reported transmission events, with no sequence data involved. We derive a likelihood for a stochastic SIR process on a rooted contact tree in which each infected individual is characterised by the total number of effective contacts, and the number of already infected downstream contacts. We obtain closed-form ordinary differential equations for the probability that a clade goes entirely unobserved and for the probability density that it produces an observed (sampled) tip in a given state. The resulting likelihood can be evaluated for a rooted contact tree with known tip states, and we extend it to partially resolved trees by treating internal branching times as latent variables. Validation on simulated outbreaks confirms accurate parameter recovery and well calibrated uncertainty. Application to empirical COVID-19 contact-tracing data from Karnataka, India, demonstrates the framework's utility for real epidemiological settings. By incorporating contact-degree heterogeneity in a multi-type branching likelihood, our work provides a principled baseline for inferring both transmission dynamics and contact structure from fully or partially resolved transmission trees, complementing rather than relying on sequence-based phylodynamic inference

2606.20534 2026-06-19 math.OC 新提交 80%

On Second-Order Methods for Bilevel Optimization

关于双层优化的二阶方法

Jiawen Bi, Jiaxiang Li, Mingyi Hong, Shuzhong Zhang

专题命中 其他科学智能 :提出双层优化二阶方法,达最优复杂度

AI总结 本文针对双层优化问题,提出了一种单循环三次正则牛顿算法,在非凸上层和强凸下层设置下,实现了最优的O(ε^{-1.5})总预言复杂度,首次达到二阶驻点的最优收敛率。

详情
AI中文摘要

双层优化是现代机器学习和工程设计不可或缺的建模工具。然而,在双层优化中寻找二阶驻点的理论和实践仍然很大程度上未解决。即使对于具有强凸下层问题的双层优化,其诱导的超函数通常是非凸的。尽管三次正则牛顿方法(CRN)在单层优化中实现了最优的$\mathcal{O}(\varepsilon^{-1.5})$ SOSP(二阶驻点)率,但如何控制将二阶方法应用于双层问题时超梯度和超Hessian计算的精度,以使整个过程高效,仍不清楚。在本文中,我们着手回答这个问题。特别地,我们首先制定了一个双循环CRN基线,该基线实现了最优的外层率,但需要重复的下层求解。接下来,我们提出了一种单循环三次正则牛顿算法,该算法将一个下层梯度步与一个用于超梯度的牛顿步相结合,并证明了总体确定性的$\mathcal{O}(\varepsilon^{-1.5})$总预言复杂度,这是最优的。此外,我们说明了一些直观简单的修改可能无法维持收敛结果。据我们所知,这是第一个用于无约束NCSC(非凸上层和强凸下层)双层优化设置的确定性单循环方法,该方法实现了寻找超函数$\varepsilon$-SOSP的$\mathcal{O}(\varepsilon^{-1.5})$最优收敛率。

英文摘要

Bilevel optimization is an indispensable modeling tool for modern machine learning and engineering design. However, the theory and practice for finding second order stationary points in the context of bilevel optimization still remain largely unsettled. Even for bilevel optimization with strongly convex lower-level problem, the hyperfunction it induces is in general nonconvex. Although the Cubic Regularized Newton methods (CRN) famously achieve the optimal $\mathcal{O}(\varepsilon^{-1.5})$ SOSP (second-order stationary point) rate in single-level optimization, it is unclear how to control the accuracy of the hypergradient and hyper-Hessian computations in the context of applying the second-order methods to bilevel problems in order for the overall process to be efficient. In this paper, we set out to answer this question. In particular, we first formulate a double loop CRN baseline that achieves the optimal outer rate but requires repeated lower level solves. Next, we propose a single loop cubic regularized Newton algorithm that combines one lower-level gradient step with one Newton step for the hypergradient, and prove an overall deterministic $\mathcal{O}(\varepsilon^{-1.5})$ total oracle complexity, which is optimal. In addition, we illustrate that some intuitively simple modifications of our method may fail to hold up the convergence result. To the best of our knowledge, this is the first deterministic single loop method for unconstrained NCSC (non-convex upper-level and strongly convex lower-level) bilevel optimization setting that achieves the $\mathcal{O}(\varepsilon^{-1.5})$ optimal convergence rate for finding an $\varepsilon$-SOSP of the hyperfunction.

2606.20329 2026-06-19 cs.LG physics.geo-ph 新提交 80%

Constrained hybrid modelling to predict microbial dynamics and organic matter turnover in soil systems

约束混合建模预测土壤系统中微生物动态与有机质周转

Paul Collart, Juergen Gall, Andrea Schnepf, Holger Pagel, Lars Doorenbos

发表机构 * Agrosphere (IBG-3), Forschungszentrum Jülich GmbH(农业圈(IBG-3),于利希研究中心) Institute of Crop Science and Resource Conservation, University of Bonn(波恩大学作物科学与资源保护研究所) Institute of Computer Science, University of Bonn(波恩大学计算机科学研究所) Lamarr Institute for Machine Learning and Artificial Intelligence(拉马尔机器学习和人工智能研究所)

专题命中 其他科学智能 :土壤微生物建模,环境科学机器学习

AI总结 提出首个混合建模框架,利用神经网络从宏基因组推断功能性状预测过程模型参数,并整合生态理论约束,有效预测微生物动态和有机质周转。

Comments Accepted at ICML '26

详情
AI中文摘要

土壤微生物控制有机质循环,并在很大程度上决定土壤系统如何应对和缓解气候变化及环境威胁。因此,在基于过程的土壤模型中表示微生物动态对于预测土壤碳循环至关重要,尽管从数据中获取信息极具挑战性。改进参数化的一个有前景的方法是整合基因组数据,然而建模基因组与微生物驱动过程之间复杂且未知的关系是一个未解决的问题。在这项工作中,我们提出了第一个混合建模框架,用于从基于DNA测序数据的宏基因组推断功能性状中推导基于过程的土壤有机质周转模型的生物动力学参数值。我们的模型通过神经网络从基因组性状数据预测过程模型的生物动力学参数,并整合来自生态理论和文献的约束,以确保即使是非观测状态变量也能实现逼真的行为。我们在不同复杂度的合成基因组性状数据集和真实数据上评估了我们的方法,结果表明,我们的方法在多个基线上提高了性能,并有效学习了过程模型中不可测量组分的动态,即使是在小训练数据集上也是如此。

英文摘要

Soil microorganisms control organic matter cycling and largely determine how soil systems can cope with and mitigate climate change and environmental threats. Representing microbial dynamics in process-based soil models is therefore critical to predict carbon cycling in soils, albeit highly challenging to inform from data. One promising approach to improve their parametrisation is the integration of genomic data, yet modelling the complex and unknown relationship between genomes and the processes the microbes are driving is an unsolved problem. In this work, we present the first hybrid modeling framework for deriving biokinetic parameter values of a process-based soil organic matter turnover model from metagenome-inferred functional traits based on DNA sequencing data. Our model predicts biokinetic parameters of the process-based model from genomic trait data with a neural network and integrates constraints from ecological theory and literature to ensure realistic behavior, even of non-observed state variables. We evaluate our method on synthetic genomic trait datasets of varying complexity and on real data, showing that our approach improves performance over multiple baselines and learns the dynamics of unmeasurable components of the process-based model effectively, even for small training datasets.

2606.20145 2026-06-19 q-fin.ST cond-mat.stat-mech physics.data-an q-fin.MF q-fin.RM 新提交 80%

Trends, Volatility, Correlations, and Critical Phenomena in Financial Markets

金融市场中的趋势、波动率、相关性和临界现象

Sara A. Safari, Christoph Schmidhuber

专题命中 其他科学智能 :金融市场趋势与波动率预测,属于经济物理

AI总结 基于当前市场趋势预测未来波动率和相关性,发现趋势强度与波动率、相关性呈二次关系,改进风险预测并支持临界点晶格气体模型。

Comments 31 pages, 9 figures

详情
AI中文摘要

我们基于金融市场的当前趋势预测未来的波动率和相关性。这补充了先前的工作,该工作通过当前趋势强度的三次多项式来建模未来预期收益。经验上,我们观察到在强烈上升或下降趋势期间,波动率和相关性往往逐日增加。这种效应在下降趋势中尤为显著。它可以通过当前趋势强度的二次多项式精确量化,这细化了波动率和相关性的常见均值回归模型。我们的结果通过考虑市场趋势改进了市场风险的预测。它们也支持最近一项将金融市场建模为接近其临界点的晶格气体的提议。

英文摘要

We forecast future volatilities and correlations of financial markets based on the current trends in these markets. This complements previous work that models future expected returns by a cubic polynomial of the current trend strength. Empirically, we observe that volatilities and correlations tend to increase day after day in times of strong up- or down-trends. This effect is particularly pronounced in down-trends. It can be accurately quantified by quadratic polynomials of today's trend strengths, which refine common mean-reversion models of volatilities and correlations. Our results improve the prediction of market risk by accounting for market trends. They also support a recent proposal to model financial markets by a lattice gas near its critical point.

2606.19860 2026-06-19 physics.comp-ph cond-mat.stat-mech physics.soc-ph 新提交 80%

The Heat Kernel Expansion: Curvature for Shock Detection in Higher-Order Financial Networks

热核展开:高阶金融网络中的曲率用于冲击检测

Mohammad Elsayed, Sara Najem

专题命中 其他科学智能 :热核展开检测金融网络冲击,属于经济物理

AI总结 本文通过热核展开系数定义曲率,用于检测高阶金融网络中的冲击,发现曲率比欧拉示性数和挠率更敏感地捕捉法律变化的影响。

详情
AI中文摘要

本研究跟踪了挪威金融网络在九年期间每月的变化。数据包括董事会成员及其与公司的关联,我们将其建模为单纯复形。在此框架中,董事表示为节点,公司表示为复形的面。为了表征后者,我们关注三个拓扑度量:通过贝蒂数计算的欧拉示性数、通过高阶拉普拉斯矩阵的简化行列式计算的挠率,以及高阶聚类系数。前两者未能捕捉到法律对代表权的影响,而我们的曲率概念则不同,它是通过热核在时间幂次上的级数展开系数计算的几何度量,这是本工作的主要贡献。特别地,欧拉示性数积分了曲率,因此局部信息丢失。随后,并非所有拓扑度量都能可靠地捕捉网络中的冲击。此外,生成树的数量可能在最低阶发生显著变化,但这些变化不一定反映在挠率中。相反,曲率的变化揭示了因立法导致的董事会连锁变化,并作为检测网络中冲击的敏感度量。曲率的拐点与外部强迫相关,最小值与冲击到达时间相关。在挠率的分量中也观察到尖锐转变,而在高阶聚类中观察到平滑变化。

英文摘要

This work follows the evolution of financial networks in Norway over a period of nine years at a monthly rate. The data consist of board directors and their affiliations to companies, which we model as simplicial complexes. In this framework, directors are represented as nodes and companies as faces of the complex. To characterize the latter, we focus on three topological measures: the Euler characteristic, computed through the Betti numbers, torsion computed through the reduced determinant of the higher-order Laplacians, and higher-order clustering coefficients. The first two fail to capture the effect of imposed law on representation, unlike our notion of curvature which is a geometrical measure computed from the coefficients of the series expansion of the heat kernel in powers of time, which is our major contribution in this work. In particular, the Euler characteristic integrates curvature, and thus local information is lost. Subsequently, not every topological measure can reliably capture shocks in networks. Further, the number of spanning trees may undergo significant changes at the lowest order, yet these changes need not be reflected in the torsion. Conversely, the change in the curvature revealed variation in the board interlock due to legislation, and serves as a sensitive measure for detecting shocks in networks. Inflection points in curvature are associated with external forcing, and minima with shock arrival times. Sharp transitions are also observed in the components of torsion, while smooth changes are observed in higher-order clustering.

2606.16803 2026-06-19 q-bio.MN q-bio.SC 新提交 80%

Cell Division Changes Fate Decisions in a Genetic Toggle Switch

细胞分裂改变遗传开关中的命运决定

Charli Austin, Nikola Popovic, Ramon Grima

专题命中 其他科学智能 :研究细胞分裂对遗传开关命运的影响

AI总结 本研究通过分析布尔型遗传开关模型,发现细胞分裂可将相同初始条件的轨迹导向不同稳定态,并定义了忽略分裂时命运预测错误的区域,表明分裂可重塑多稳态调控网络的命运边界。

Comments 16 pages;7 figures. Includes new Figure A.2 comparing the separatrices of the classical and Boolean toggle switches, with and without cell division. Two Appendices (previously H and I in the previous version) integrated into Appendix E for clarity

详情
AI中文摘要

基因调控网络通过多稳态动力学控制细胞命运决定。遗传开关是此类行为的经典模型;然而,细胞分裂对其动力学的影响仍知之甚少。我们推导了有无分裂的简化布尔型开关的解析分界线。我们证明,分裂可以将具有相同初始条件的轨迹重定向到相反的稳定态,并定义了一个不一致区域,在该区域中,如果忽略分裂,则命运预测错误。我们的结果表明,分裂可以从根本上重塑多稳态调控网络中的命运边界。

英文摘要

Gene regulatory networks govern cellular fate decisions through multistable dynamics. The genetic toggle switch is a canonical model of such behaviour; yet, the impact of cell division on its dynamics remains poorly understood. We derive analytical separatrices for a simplified Boolean toggle switch with and without division. We show that division can redirect trajectories with identical initial conditions to opposing stable states, and we define a region of disagreement where fate decisions are predicted incorrectly if division is neglected. Our results imply that division can fundamentally reshape fate boundaries in multistable regulatory networks.

2606.12660 2026-06-19 math.NT math.AC math.GR 新提交 80%

Root Clusters and Multiclusters over Imperfect Hilbertian Fields

根簇与多簇在不完美希尔伯特域上的推广

Shubham Jaiswal

专题命中 其他科学智能 :将根簇理论推广到一般域,属于数学理论扩展

AI总结 将根簇理论从完美域推广到一般域,引入根簇大小、多簇大小等概念,并在希尔伯特域上建立了这些广义概念的逆问题结果。

Comments 37 pages. Updated version

详情
AI中文摘要

我们将根簇理论从完美域推广到不一定完美的一般域。对于任意基域上的域扩张,我们引入了以下概念并研究了它们的性质:根簇大小、多簇大小及其推广根容量、多根容量;上升指数、上升正规指数及其推广交指数、交正规指数;复合指数和复合正规指数。我们在希尔伯特域上建立了这些广义概念的逆问题的结果,这推广了我们先前在数域上的结果。特别地,我们证明在给定的希尔伯特域上,存在给定次数、簇大小和多簇大小的多项式,以及存在给定根容量和多根容量的扩张(相对于该多项式)。

英文摘要

We extend the theory of root clusters from perfect fields to general fields which are not necessarily perfect. We introduce the following notions for field extensions over any given base field and study their interesting properties: root cluster size, multicluster size and their generalizations root capacity, multiroot capacity; ascending index, ascending normal index and their generalizations intersection indicium, intersection normal indicium; compositum indicium and compositum normal indicium. We establish our results on the Inverse problems for these generalized notions over Hilbertian fields which generalizes our earlier results which were over number fields. In particular, we show over a given Hilbertian field, the existence of a polynomial for given degree, cluster size and multicluster size and existence of an extension for given root capacity and multiroot capacity with respect to that polynomial.

2606.12194 2026-06-19 math.CO math.NT 新提交 80%

Beating Product Constructions for Linear Equations Over Finite Fields

击败有限域上线性方程组的乘积构造

Paul Hametner, Fred Tyrrell

专题命中 其他科学智能 :有限域上线性方程组的组合数学研究

AI总结 本文证明,对于任何避免非平凡解的亏格一平移不变线性方程的子集A,存在更高维度的子集B也避免非平凡解,且其密度大于A的密度,从而说明仅通过直接乘积无法得到渐近最优下界。

Comments 10 pages

详情
AI中文摘要

我们证明,对于任何 $A\subseteq \mathbb{F}_q^n$,如果它缺乏亏格一的平移不变线性方程的非平凡解(即系数的任何非空真子集之和不为 $0$),那么存在某个更高维度的集合 $B\subseteq \mathbb{F}_q^m$,它也缺乏非平凡解,并且满足 \\[|B|^{1/m}>|A|^{1/n}.\\] 特别地,这意味着在 $\mathbb{F}_3^n$ 中,没有固定的帽集能通过直接乘积单独给出渐近最优下界。

英文摘要

We show that for any $A\subseteq \mathbb{F}_q^n$ lacking non-trivial solutions to a translation-invariant linear equation of genus one, meaning that no nonempty proper subset of the coefficients sums to $0$, there is a set $B\subseteq \mathbb{F}_q^m$ in some higher dimension which also lacks non-trivial solutions, such that \[|B|^{1/m}>|A|^{1/n}.\] In particular, this implies that no fixed cap set in $\mathbb{F}_3^n$ gives an asymptotically optimal lower bound by direct products alone.

2606.10358 2026-06-19 cs.LG cs.AI 新提交 80%

KG-SoftMAP: Soft Knowledge-Graph Priors for Bayesian Network Structure Learning from Sparse Discrete Data

KG-SoftMAP: 基于软知识图谱先验的稀疏离散数据贝叶斯网络结构学习

Guoliang Xu, James E. Corter

发表机构 * Columbia University(哥伦比亚大学)

专题命中 其他科学智能 :贝叶斯网络结构学习,结合知识图谱先验

AI总结 针对稀疏离散数据中贝叶斯网络结构学习困难的问题,提出KG-SoftMAP方法,将加权有向知识图谱编码为软先验,结合BDeu评分与logit形式先验最大化MAP目标,在合成与真实数据上显著提升结构恢复性能。

Comments 41 pages including appendices, 2 figures

详情
AI中文摘要

从稀疏离散数据中学习贝叶斯网络(BN)结构是困难的:当每个实例仅记录少数变量时,大多数变量对缺乏可靠评分所需的联合观测,且纯数据方法恢复的结构很少。不完美的领域知识,可表示为加权有向知识图谱(KG),通常是可用的。我们提出KG-SoftMAP,它将这样的KG编码为软性的、置信度加权的、可被数据覆盖的边先验,并最大化结合BDeu评分与logit形式先验的MAP目标;KG可由专家整理或由LLM提取。在受控的合成基准(唯一具有真实DAG的设置)上,KG-SoftMAP在$\rho=0.05$时恢复部分有向结构(DF1从$0.14$到$0.29$,而基线接近零),当$\rho\geq0.2$时恢复更多(DF1从$0.46$到$0.96$),前提是配有一个信息丰富但不完美的KG;恢复性能随KG质量下降而优雅地退化。在无真实DAG的真实稀疏教育数据上,我们仅评估面向部署的指标:预测、校准和KG一致性。学习到的BN最好被解读为诊断模型:在SAF上,它落后于逻辑回归$0.03$的F1_FAIL,同时提供KG一致的边、校准的联合概率以及从任意观测概念子集的推理;当不存在有意义的KG时,判别式逻辑回归更可取。

英文摘要

Learning Bayesian network (BN) structure from sparse discrete data is hard: when each instance records only a few variables, most variable pairs lack the joint observations needed for reliable scoring, and data-only methods recover little structure. However, imperfect domain knowledge, expressible as a weighted directed knowledge graph (KG), is often available. We propose KG-SoftMAP, which encodes such a KG as a finite-strength, confidence-weighted edge prior and maximizes a MAP objective combining the BDeu score with a logit-form prior; the KG may be expert-curated or LLM-extracted. On synthetic benchmarks with known DAGs, KG-SoftMAP reaches Directed-F1 (DF1) $0.19$--$0.32$ at observation rate $ρ=0.05$ and DF1 $0.44$--$0.97$ at $ρ\geq0.2$, while every data-only learner tested stays near zero under the same sparse masks. Recovery tracks KG quality: controlled corruption degrades it smoothly, a zero-signal KG yields DF1 $0.00$, and a blindly LLM-extracted KG with imperfect precision and recall still drives substantial recovery. On three real sparse educational datasets, the learned BN acts as a concept-level posterior model: on SAF it matches logistic regression (LR) within $0.03$ F1_FAIL while providing an inspectable concept graph, calibrated Fail probabilities, and tractable posterior queries from partial observations.

2606.09545 2026-06-19 math.NT 新提交 80%

On the Smallest Counterexample to the Log-Concavity of the D'Arcais Polynomials

关于 D'Arcais 多项式对数凹性的最小反例

Steven Charlton, Bernhard Heim, Johann Stumpenhusen

专题命中 其他科学智能 :D'Arcais多项式对数凹性反例的数学研究

AI总结 通过改进渐近方法,确定了 D'Arcais 多项式对数凹性猜想的最小反例为 λ=65,214,507,758,400,并研究了反例的渐近密度。

Comments 17 pages; minor typos corrected

详情
AI中文摘要

最近,Starr 使用渐近方法反驳了 Heim--Neuhauser 和 Abdesselam 关于 D'Arcais 多项式对数凹性的猜想,但没有给出具体的反例。我们改进了渐近方法,给出了关于 $σ_{-1}$ 卷积的必要估计,并确定了第一个反例为 $λ=65\,214\,507\,758\,400$。我们还考虑了此类反例的渐近密度。

英文摘要

Recently, Starr used asymptotic methods to disprove a conjecture by Heim--Neuhauser and Abdesselam about the log-concavity of the D'Arcais polynomials, without giving an explicit counterexample. We refine the asymptotics, to give the necessary estimates on convolutions of $σ_{-1}$, and identify the first counterexample at $λ= 65\,214\,507\,758\,400$. We also consider the asymptotic density of such counterexamples.

2606.09524 2026-06-19 math.GR 新提交 80%

On the Quartic-free A-groups

关于四次自由A-群

Prashun Kumar

专题命中 其他科学智能 :有限群结构理论,纯数学研究

AI总结 研究四次自由A-群的结构,并确定可解四次自由A-群的导长。

Comments 7 pages

详情
AI中文摘要

一个有限群被称为四次自由的,如果它的阶不被任何素数$p$的$p^4$整除。一个有限群被称为$A$-群,如果它的所有Sylow子群都是阿贝尔群。本文的目的是提供四次自由$A$-群的显式结构。此外,在提供显式结构的过程中,我们还确定了可解四次自由$A$-群的导长。

英文摘要

A finite group is said to be quartic-free if its order is not divisible by $p^4$ of any prime $p$. A finite group is called an $A$-group if all of its Sylow subgroups are abelian. Objective of this paper is to provide explicit structure of a quartic-free $A$-group. Further in the process of providing the explicit structure we also determine the derived length of a solvable quartic-free $A$-group.

2503.04507 2026-06-19 q-bio.QM cs.CG cs.LG 交叉投稿 80%

The Morse Transform for Discrete Shape Analysis

离散形状分析的Morse变换

Alexander M. Tanaka, Aras T. Asaad, Richard Cooper, Vidit Nanda

专题命中 其他科学智能 :提出Morse变换量化几何形状,用于配体筛选

AI总结 提出一种基于定向分段线性Morse理论的拓扑变换,通过记录多个高度函数下的临界点来量化嵌入对象的几何形状,生成的特征向量在配体虚拟筛选中取得最优平均AUROC。

Comments 37 pages, 3 main figures, 2 main tables, 12 appendix figures and 4 appendix tables

详情
AI中文摘要

物体的几何形状在调节其与物理世界的相互作用中起着至关重要的作用。然而,为了统计推断或分类任务的目的,用数值描述几何信息仍然困难。在这里,我们引入了一种新的拓扑变换,它利用定向分段线性Morse理论,通过编录多个高度函数下的临界点来量化嵌入对象的几何形状。该Morse变换的输出记录了表征底层形状的临界点的高度和局部拓扑类型(峰、谷或鞍点),保留了比欧拉特征变换更精细的信息,同时自然优先考虑形状的最外层区域。关键的是,该输出可以进一步压缩为丰富而紧凑的特征向量。我们将Morse特征向量作为配体虚拟筛选(LBVS)的描述符进行基准测试,这本质上依赖于分子的形状。在常见的梯度提升树分类流程下,与其他拓扑变换描述符和标准基于形状的LBVS描述符相比,Morse描述符实现了最高的平均AUROC。

英文摘要

The geometry of an object plays a vital role in modulating its interactions with the physical world. It nevertheless remains difficult to describe geometric information numerically for the purposes of statistical inference or classification tasks. Here, we introduce a new topological transform which leverages directional piecewise-linear Morse theory to quantify the geometry of an embedded object by cataloguing critical points across multiple height-functions. The output of this Morse transform records both the heights and the local topological type (peak, trough or saddle) of the critical points that characterise the underlying shape, retaining finer information than the Euler characteristic transform whilst naturally prioritising a shape's outermost regions. Crucially, this output can be further compressed into a rich but compact feature vector. We benchmark the Morse feature vector as a descriptor for ligand-based virtual screening (LBVS), which intrinsically depends on the shape of molecules. Under a common gradient-boosted tree classification pipeline, Morse descriptors achieve the highest mean AUROC when compared to other topological transform descriptors and to standard shape-based LBVS descriptors.

2606.19580 2026-06-19 stat.ME stat.ML 新提交 75%

Machine Learning Integrated in Wavelet Shrinkage (MLShrink)

机器学习集成小波收缩 (MLShrink)

Dixon Vimalajeewa, Vijini Lakmini, Brani Vidakovic

专题命中 其他科学智能 :结合机器学习与小波收缩进行信号去噪

AI总结 提出MLShrink,结合小波收缩与机器学习,通过双阈值对中间带系数进行数据自适应分类,保留经典阈值简单性,理论证明其非扩张性和oracle一致性,在非平滑信号上表现优异。

详情
AI中文摘要

实践中遇到的数据经常被加性噪声污染,小波收缩仍是非参数估计中恢复潜在信号的基本工具。经典方法如硬阈值和软阈值几乎完全根据系数的大小决定是否保留。尽管在许多情况下有效,这些规则对于幅度落在信号与噪声区分不确定的中间区域的系数可能过于僵化。我们提出MLShrink,一种将小波收缩与机器学习相结合的双阈值小波去噪过程。低于下阈值的系数被丢弃,高于上阈值的系数被保留,中间带的系数使用局部小波域特征进行分类。这样,MLShrink在远离决策边界处保留了经典阈值的简单性,同时允许对模糊系数进行数据自适应决策。本文还为此架构开发了一个理论框架。我们证明MLShrink是一个非扩张的支持选择规则,推导出一个基于oracle的风险分解,表明多余的去噪风险由未决策带上的分类误差决定,并在分类器性能的适当假设下建立了oracle一致性结果。在标准基准信号上的模拟实验表明,MLShrink与几种已建立的小波收缩方法具有竞争力,尤其适用于具有不规则、边缘丰富或非平滑结构的信号。这些发现表明,中间阈值带上的学习决策为经典小波去噪与现代统计学习之间提供了有用且可解释的联系。

英文摘要

Data encountered in practice are frequently contaminated by additive noise, and wavelet shrinkage remains a fundamental tool for recovering underlying signals in nonparametric estimation. Classical procedures such as hard and soft thresholding decide whether to retain a wavelet coefficient almost entirely from its magnitude. Although effective in many settings, these rules can be too rigid for coefficients whose magnitudes fall in an intermediate region where the distinction between signal and noise is uncertain. We propose MLShrink, a two-threshold wavelet denoising procedure that combines wavelet shrinkage with machine learning. Coefficients below a lower threshold are discarded, coefficients above an upper threshold are retained, and coefficients in the intermediate band are classified using local wavelet-domain features. In this way, MLShrink preserves the simplicity of classical thresholding away from the decision boundary while allowing data-adaptive decisions for ambiguous coefficients. The paper also develops a theoretical framework tailored to this architecture. We show that MLShrink is a nonexpansive support-selection rule, derive an oracle-based risk decomposition showing that excess denoising risk is determined by classification errors on the undecided band, and establish an oracle-consistency result under suitable assumptions on classifier performance. Simulation experiments on standard benchmark signals indicate that MLShrink is competitive with several established wavelet shrinkage methods and is especially effective for signals with irregular, edge-rich, or non-smooth structure. These findings suggest that learned decisions on the intermediate threshold band provide a useful and interpretable connection between classical wavelet denoising and modern statistical learning.

2606.19870 2026-06-19 physics.med-ph 新提交 75%

Physiological Sex-Specific Haematocrit Has Minimal Effect on Coronary Computational Haemodynamics: Modelling Implications for Blood Rheology

生理性别特异性血细胞比容对冠状动脉计算血流动力学影响极小:血液流变学建模启示

C. Shen, M. Zhang, T. Shalaby, C. S. McLachlan, S. Beier

专题命中 其他科学智能 :冠状动脉血流动力学建模,属于科学智能应用

AI总结 本研究通过冠状动脉计算流体动力学模拟,发现生理范围内女性特异性血细胞比容(40%)对血流动力学指标影响统计显著但绝对差异极小,表明标准流变学模型适用于多数冠状动脉CFD研究。

详情
AI中文摘要

血细胞比容影响血液粘度,可能影响冠状动脉计算流体动力学(CFD)。然而,以往研究考察了宽泛或病理性的血细胞比容范围,尚不清楚生理范围内女性特异性血细胞比容变化是否对冠状动脉血流动力学产生有意义的变化。分析了15例女性冠状动脉,包括健康动脉和轻度、中度及重度狭窄的病变模型。开发了血细胞比容依赖的Carreau-Yasuda模型。使用标准流变学模型和女性特异性血细胞比容模型(40%)进行CFD模拟。比较了冠状动脉树、动脉节段、分叉处、狭窄血管及相应狭窄区域的时间平均内皮剪切应力(TAESS)、ESS梯度(ESSG)、时间剪切变化指数(TSVI)、螺旋度以及低/高TAESS暴露。女性特异性模型在所有指标和冠状动脉区域均与标准模型产生统计显著差异(p < 0.05)。然而,绝对差异很小,表明血流动力学影响有限。Bland-Altman分析显示窄偏倚和一致性界限。线性回归显示,对于TAESS、ESSG、螺旋强度及不良TAESS暴露,模型间差异与血流动力学幅度之间存在显著关联,但斜率较小。在狭窄动脉中也观察到类似发现,两种模型在不同狭窄严重程度下均捕捉到可比的流动扰动。生理范围内女性特异性血细胞比容变化在计算上可检测,但在冠状动脉CFD中血流动力学上可忽略。因此,标准流变学模型可能足以用于大多数冠状动脉CFD研究,而个性化血细胞比容建模更适用于血细胞比容异常的患者或流变学重点研究。

英文摘要

Haematocrit influences blood viscosity and may affect coronary computational fluid dynamics (CFD). However, previous studies examined broad or pathological haematocrit ranges, and it remains unclear whether female-specific haematocrit variations within the physiological range produce meaningful changes in coronary haemodynamics. 15 female coronaries were analysed, including healthy arteries and diseased models with mild, moderate and severe stenosis. A haematocrit-dependent Carreau-Yasuda model was developed. CFD simulations were performed using the standard rheology model and a female-specific haematocrit-based model (40%). Time-averaged endothelial shear stress (TAESS), ESS gradient (ESSG), temporal shear variation index (TSVI), helicity, and low/high TAESS exposure were compared across coronary trees, arterial segments, bifurcations, stenosed vessels and corresponding narrowed regions. The female-specific model produced statistically significant differences from the standard model across all metrics and coronary regions (p < 0.05). However, the absolute differences were small, indicating a limited haemodynamic impact. Bland-Altman analysis showed narrow biases and limits of agreement. Linear regression demonstrated significant associations between inter-model differences and haemodynamic magnitude for TAESS, ESSG, helicity intensity, and adverse TAESS exposure, but the slopes were small. Similar findings were observed in stenosed arteries, where both models captured comparable flow disturbances across stenosis severities. Female-specific haematocrit variation within the physiological range is computationally detectable but haemodynamically negligible in coronary CFD. A standard rheology model is therefore likely sufficient for most coronary CFD studies, while personalised haematocrit modelling is more relevant for patients with abnormal haematocrit or rheology-focused studies.

2606.20249 2026-06-19 astro-ph.EP physics.geo-ph 新提交 75%

Geophysical and atmospheric implications of $f$O$_{2}$-dependent melting on rocky exoplanets

岩石系外行星上依赖于氧逸度的熔融对地球物理和大气的影响

Mariana Sastre, Tim Lichtenberg, Laurent Soucasse, Dan J. Bower, Harrison Nicholls, Inga Kamp

专题命中 其他科学智能 :系外行星内部-大气耦合模拟

AI总结 通过耦合内部-大气框架PROTEUS,量化了氧逸度依赖的熔融曲线对岩石系外行星热结构、熔融分数和流变演化的非线性影响,揭示了挥发分库存和表面氧逸度对热状态的主要调控作用。

Comments 15 pages, 8 figures; accepted for publication in Astronomy & Astrophysics

详情
AI中文摘要

长期存在的岩浆海洋的地球化学演化受到熔融地幔与大气之间挥发性交换的强烈调控。对于处于失控温室极限内的行星,这种耦合演化可以持续数十亿年。然而,大多数现有研究假设类地(氧化)条件,并忽略了氧化还原状态对熔体热力学和挥发性释放的影响。我们量化了在耦合内部-大气框架PROTEUS中实现的实验推导的、氧逸度依赖的熔融曲线如何传播到岩石系外行星内部的热结构、熔融分数和流变演化,并将其应用于短周期超级地球GJ 1132 b。我们发现熔融曲线的变化导致强烈的非线性热响应。在贫挥发分系统中,相对于氧化和类地情况,还原熔融曲线促进了早期深部地幔结晶,有利于由温室效应维持的晚期表面岩浆海洋,而氧化熔融曲线则维持较高的熔融分数和垂直延伸的岩浆海洋。还原地幔产生大量的H$_2$-CO富集大气;氧化地幔则倾向于较薄的H$_2$O-CO$_2$包层。在富挥发分系统中,内部在高熔融分数下达到辐射平衡,维持稳态全球岩浆海洋,其中熔融曲线的变化不会显著影响凝固时间。这表明了层次控制:挥发分库存和表面氧逸度作为热状态的主要调节者,而氧逸度依赖的熔融关系提供次级调制。这些对比鲜明的状态产生不同的大气组成和形成时间尺度,为近距离岩石系外行星提供了可测试的光谱预测,这些预测可通过即将进行的JWST观测进行评估。

英文摘要

The geochemical evolution of long-lived magma oceans is strongly regulated by volatile exchange between the molten mantle and the atmosphere. For planets inside the runaway-greenhouse limit, this coupled evolution can persist for billions of years. However, most existing studies assume Earth-like (oxidized) conditions and neglect the influence of redox state on melt thermodynamics and volatile release. We quantified how experimentally derived, oxygen-fugacity-dependent melting curves implemented within the coupled interior-atmosphere framework PROTEUS propagate into the thermal structure, melt fraction, and rheological evolution of rocky exoplanet interiors, applying this to the short-period super-Earth GJ 1132 b. We found strongly non-linear thermal responses to variations in melting curves. In volatile-poor systems, reduced melting curves promote earlier deep-mantle crystallisation relative to oxidised and Earth-like cases, favouring late-stage surface magma oceans sustained by greenhouse warming, while oxidized melting curves maintain higher melt fractions and a vertically extended magma ocean. Reduced mantles produce massive H$_2$-CO-rich atmospheres; oxidized mantles favour thinner H$_2$O-CO$_2$ envelopes. In volatile-rich systems, the interior reaches radiative equilibrium at high melt fractions, sustaining a steady-state global magma ocean in which melting curve variations do not significantly influence solidification timing. This indicates a hierarchical control: volatile inventory and surface oxygen fugacity act as the primary regulators of thermal state, while oxygen-fugacity-dependent melting relations provide a secondary modulation. These contrasting regimes produce distinct atmospheric compositions and formation timescales, offering testable spectral predictions for close-in rocky exoplanets evaluable with forthcoming JWST observations.