arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.08094 2026-05-12 cs.CY cs.AI

MedThink: Enhancing Diagnostic Accuracy in Small Models via Teacher-Guided Reasoning Correction

Xinchun Su, Chunxu Luo, Lipeng Ma, Yixuan Li, Weidong Yang

AI总结 MedThink 是一种两阶段的知识蒸馏框架，旨在提升小型语言模型在临床诊断中的推理能力。该方法通过教师模型指导学生模型，首先注入领域知识进行预训练，随后针对学生模型的错误生成推理链并进行二次微调，从而增强其诊断准确性。实验表明，MedThink 在多个医学基准测试中显著优于其他蒸馏策略，尤其在胃肠病学数据集上达到了56.4%的最高准确率，展示了其在保持计算效率的同时提升小型模型诊断能力的有效性。

2605.08093 2026-05-12 cs.CY cs.AI cs.HC

Playing Games with My Heart: An Evaluation of AI Companion Apps

Maribeth Rauh, Dick A. H. Blankvoort, Matias Duran, Caoilfhionn Ní Dheoráin, Harshvardhan J. Pandit, Siddharth D. Jaiswal, Anthony Ventresque, Abeba Birhane

AI总结随着AI伴侣类应用的迅速发展，其对用户情感依赖和心理影响引发了广泛关注。本文评估了欧盟和英国市场上五款最受欢迎的AI伴侣应用程序，分析其设计中可能促进 parasocial 互动并诱导用户行为的机制，如暗黑设计模式、拟人化特征、刻板印象和游戏化元素。研究发现，所有应用均包含大量旨在提升盈利和用户粘性的设计策略，并提出了针对监管机构的政策建议，以加强该新兴市场的消费者保护。

2604.26962 2026-05-12 cs.CY cs.AI cs.CL

DeepTutor: Towards Agentic Personalized Tutoring

Bingxi Zhao, Jiahao Zhang, Xubin Ren, Zirui Guo, Tianzhe Chu, Yi Ma, Chao Huang

AI总结 DeepTutor 是一个开源的智能辅导框架，旨在通过结合基于引用的辅导和难度校准的问题生成，提供个性化的学习支持。该框架采用混合个性化引擎，结合静态知识和动态学习记忆，持续适应学生的需求，并扩展到自适应学习流程、互动书籍和多渠道辅导代理。研究还引入了 TutorBench 评估基准和基于大语言模型的个性化评估方法，实验表明 DeepTutor 在个性化指标和智能代理推理能力上均有显著提升。

Comments Tech Report, work in progress. Code available at https://github.com/HKUDS/DeepTutor

2604.23238 2026-05-12 cs.CR cs.AI

Hiding in Plain Sight: Detectability-Aware Antidistillation of Reasoning Models

Max Hartman, Vidhata Jayaraman, Moulik Choraria, Yash Savani, Lav R. Varshney

AI总结本文研究了如何在不损害教师模型性能的前提下，防止推理模型的推理轨迹被第三方通过蒸馏技术窃取。为解决现有反蒸馏方法忽视可检测性的问题，作者将反蒸馏建模为一个Stackelberg博弈，并引入稀疏扰动策略以降低被检测的风险。通过机制可解释性分析，他们识别出对模型输出影响显著但语义隐蔽的“思维锚点”，并基于此提出了一种无需训练、黑盒可用的反蒸馏方法TraceGuard，有效抑制学生模型的学习同时保持轨迹一致性。

2603.22868 2026-05-12 cs.CR cs.AI

Agent-Sentry: Bounding LLM Agents via Execution Provenance

Rohan Sequeira, Stavros Damianakis, Umar Iqbal, Konstantinos Psounis

AI总结随着智能体计算系统的广泛应用，其带来的安全、隐私和可靠性问题日益突出。为了解决系统执行流程不确定、难以验证任务执行合法性的问题，本文提出了一种名为Agent-Sentry的运行时防御机制，通过学习合法执行的边界来检测异常行为。该方法结合了结构分类、敏感参数白名单和LLM判断三种互补的检查方式，在不修改智能体及其工具的前提下，有效阻止了94.3%的注入攻击，同时保持了95.1%的正常操作通过率。

2602.16708 2026-05-12 cs.CR cs.AI cs.MA

Formal Policy Enforcement for Real-World Agentic Systems

Nils Palumbo, Sarthak Choudhary, Jihye Choi, Guy Amir, Prasad Chalasani, Somesh Jha

AI总结本文研究了如何在现实世界的智能体系统中形式化地执行安全策略，提出了一个基于面向方面编程的框架，将策略与智能体的推理过程分离，并在每个关键决策点进行强制执行。该框架使用Datalog作为策略语言，支持递归规则和确定性执行，并通过形式化的假设/保证合同确保策略的正确性。研究还实现了名为FORGE的系统，并在多个实际案例中验证了其有效性。

详情

英文摘要

Security policy enforcement in contemporary agentic systems predominantly consists of embedding natural-language policies within an agent's system prompt and delegating compliance to the agent's reasoning. This approach admits no formal enforcement guarantee and cannot express policies whose satisfaction depends on the causal history of an execution, a gap that becomes acute in multi-agent systems, where enforcement must reason across agents. We argue that policy enforcement in agentic systems is most naturally understood as a cross-cutting concern, and propose a framework grounded in aspect-oriented programming that specifies policies independent of the agent's reasoning and enforces them at every policy-relevant decision. Policies are written in Datalog over a set of abstract predicates describing the execution context, an observability service governed by a formal assume/guarantee contract maintains these predicates, and a reference monitor consults the policy at each action to produce an enforcement decision. When the environment contract holds, enforcement decisions coincide with the policy's intended semantics. We adopt Datalog as the policy language, a natural fit because it supports declarative rule specification, admits recursion for policies over transitive relationships, and yields deterministic enforcement. Datalog further admits tractable static analyses for contradiction, redundancy, subsumption, and conditional reachability, enabling authors to verify policy intent and surface ambiguities inherent in natural-language specifications. We realize the framework in FORGE, which enforces policies over agentic deployments without modification to the underlying agents. We evaluate FORGE on three case studies: information flow policies for prompt injection defense, approval workflows in a multi-agent pharmacovigilance system, and organizational policies for customer service.

URL PDF HTML ☆

赞 0 踩 0

2602.12286 2026-05-12 q-bio.GN cs.CL

Mind the Gap No More: Achieving Zero-Gap Multimodal Integration via One Tokenizer

Yanan Li, Christina Yi Jin, Yuan Jin, Manli Luo, Tie Xu, Shuai Jiao, Wei He, Qing Zhang

AI总结本文研究了多模态大语言模型中异构输入整合的核心挑战，提出了一种名为“One Tokenizer”的新型架构，通过将所有模态直接映射到共享的词元空间，消除模态间的几何差距，从而实现零差距的多模态融合。该方法在DNA与文本的多模态任务中表现出色，验证了其在深度生物推理中的优越性。

Comments Under review at NeurIPS 2026

2510.19544 2026-05-12 cond-mat.dis-nn cond-mat.stat-mech cs.AI cs.LG physics.comp-ph

Demonstrating Real Advantage of Machine-Learning-Enhanced Monte Carlo for Combinatorial Optimization

Luca Maria Del Bono, Federico Ricci-Tersenghi, Francesco Zamponi

AI总结该研究探讨了机器学习辅助的蒙特卡洛方法在组合优化问题中的实际优势，聚焦于三维伊辛自旋玻璃的最小能量配置问题。研究提出了一种结合传统局部移动与机器学习生成全局移动的全局退火蒙特卡洛算法，实验表明该方法在性能和鲁棒性上均优于经典模拟退火和群体退火方法。结果清晰地证明了机器学习增强的优化方法在组合优化任务中能够超越现有经典方法。

Comments 13 main pages, 6 main figures. 4 supplementary pages, 2 supplementary figures

2510.12811 2026-05-12 cs.CR cs.LG

Applying Graph Analysis for Unsupervised Fast Malware Fingerprinting

ElMouatez Billah Karbab, Mourad Debbabi

AI总结随着恶意软件数量的迅速增长，手动分析已难以应对。本文提出了一种名为TrapNet的无监督框架，用于快速对恶意软件进行指纹识别与分组。TrapNet基于图社区检测技术，结合静态分析和一种新型的数值模糊哈希方法FloatHash，生成恶意软件的语义特征向量，并构建恶意软件相似性网络，从而高效识别具有相似语义的恶意软件群组，显著提升了检测效率和准确性。

2507.17921 2026-05-12 stat.ML cs.LG eess.IV math.ST stat.CO stat.ME stat.TH

Sliding Window Informative Canonical Correlation Analysis

Arvind Prasadan

AI总结本文提出了一种适用于流数据场景的新型典型相关分析方法——滑动窗口信息典型相关分析（SWICCA），用于实时发现两个数据集之间的相关特征。该方法结合流式主成分分析算法与滑动窗口样本，实现了对CCA成分的在线估计，具有高维数据处理能力和良好的可扩展性。文中通过数值模拟和实际数据案例验证了方法的有效性，并提供了理论性能保证。

Comments 11 pages (double column), submitted; revised with updated simulations

2502.06044 2026-05-12 stat.ML cs.LG

Differentially Private Hyperparameter Tuning using Local Bayesian Optimization

Getoar Sopa, Juraj Marusic, Marco Avella Medina, John P. Cunningham

AI总结本文研究了在验证数据包含敏感用户信息时，如何实现差分隐私的超参数调优问题。针对现有方法依赖近似随机搜索或全局贝叶斯优化导致效率低下的问题，提出了一种基于局部贝叶斯优化的差分隐私框架DP-GIBO，利用高斯过程代理模型私密地近似梯度。该方法在适当条件下可保证收敛到局部最优超参数配置，并在中高维超参数空间中表现出优于非隐私随机搜索和全局贝叶斯优化的性能。

Comments 26 pages, 6 figures

2410.01656 2026-05-12 math.ST cs.DS cs.LG stat.CO stat.ML stat.TH

Efficient Statistics With Unknown Truncation, Polynomial Time Algorithms, Beyond Gaussians

Jane H. Lee, Anay Mehrotra, Manolis Zampetakis

AI总结本文研究了在样本仅来自未知集合 $S \subseteq \mathbb{R}^d$ 的情况下，如何高效估计分布参数的问题。作者提出了一种多项式时间算法，适用于满足特定结构条件的指数族分布，并能处理由低次多项式近似表示的未知截断集 $S$，从而扩展了对高斯分布参数估计的现有结果。此外，针对截断集为半空间或轴对齐矩形的情况，作者设计了运行时间为 $\mathrm{poly}(d/\varepsilon)$ 的算法，为截断数据下的参数估计提供了更高效的解决方案。

Comments Appeared at the 65th IEEE Symposium on Foundations of Computer Science (FOCS), 2024; abstract shortened for arXiv

详情

英文摘要

We study the estimation of distributional parameters when samples are shown only if they fall in some unknown set $S \subseteq \mathbb{R}^d$. Kontonis, Tzamos, and Zampetakis (FOCS'19) gave a $d^{\mathrm{poly}(1/\varepsilon)}$ time algorithm for finding $\varepsilon$-accurate parameters for the special case of Gaussian distributions with diagonal covariance matrix. Recently, Diakonikolas, Kane, Pittas, and Zarifis (COLT'24) showed that this exponential dependence on $1/\varepsilon$ is necessary even when $S$ belongs to some well-behaved classes. These works leave the following open problems which we address in this work: Can we estimate the parameters of any Gaussian or even extend beyond Gaussians? Can we design $\mathrm{poly}(d/\varepsilon)$ time algorithms when $S$ is a simple set such as a halfspace? We make progress on both of these questions by providing the following results: 1. Toward the first question, we give a $d^{\mathrm{poly}(\ell/\varepsilon)}$ time algorithm for any exponential family that satisfies some structural assumptions and any unknown set $S$ that is $\varepsilon$-approximable by degree-$\ell$ polynomials. This result has two important applications: 1a) The first algorithm for estimating arbitrary Gaussian distributions from samples truncated to an unknown $S$; and 1b) The first algorithm for linear regression with unknown truncation and Gaussian features. 2. To address the second question, we provide an algorithm with runtime $\mathrm{poly}(d/\varepsilon)$ that works for a set of exponential families (containing all Gaussians) when $S$ is a halfspace or an axis-aligned rectangle. Along the way, we develop tools that may be of independent interest, including, a reduction from PAC learning with positive and unlabeled samples to PAC learning with positive and negative samples that is robust to certain covariate shifts.

URL PDF HTML ☆

赞 0 踩 0

2410.01244 2026-05-12 stat.ML cs.LG math.PR

Equivariant score-based generative models provably learn distributions with symmetries efficiently

Ziyu Chen, Markos A. Katsoulakis, Benjamin J. Zhang

AI总结本文研究了如何高效学习具有对称性的数据分布，提出了首个关于等变分数生成模型（SGMs）的理论分析与保证。通过改进Wasserstein-1距离下的泛化界，并结合哈密顿-雅可比-贝尔曼理论，论文证明了在不进行数据增强的情况下，使用等变向量场即可有效学习对称化分布的分数函数。研究还表明，若未在模型中引入等变结构，将导致更差的泛化性能，突显了等变先验在对称数据建模中的重要性。

2409.04463 2026-05-12 eess.SY cs.CE cs.LG cs.SY

SINDyG: Sparse Identification of Nonlinear Dynamical Systems from Graph-Structured Data, with Applications to Stuart-Landau Oscillator Networks

Mohammad Amin Basiri, Sina Khanmohammadi

AI总结本文提出了一种基于图结构数据的稀疏非线性动力系统识别方法SINDyG，旨在从具有子系统交互关系的复杂系统中准确提取动力学模型。该方法通过引入网络结构信息作为稀疏回归中的正则化项，提升了对系统动态行为的建模能力。研究以扩展的斯图尔特-兰道振子网络为例，验证了SINDyG在识别非线性动力学方面的有效性与优越性，为复杂网络系统的建模提供了新的思路。

详情

DOI: 10.1093/comnet/cnaf029
Journal ref: Journal of Complex Networks, Volume 13, Issue 5, cnaf029 (2025)

英文摘要

The combination of machine learning (ML) and sparsity-promoting techniques is enabling direct extraction of governing equations from data, revolutionizing computational modeling in diverse fields of science and engineering. The discovered dynamical models could be used to address challenges in climate science, neuroscience, ecology, finance, epidemiology, and beyond. However, most existing sparse identification methods for discovering dynamical systems treat the whole system as one without considering the interactions between subsystems. As a result, such models are not able to capture small changes in the emergent system behavior. To address this issue, we developed a new method called Sparse Identification of Nonlinear Dynamical Systems from Graph-structured data (SINDyG), which incorporates the network structure into sparse regression to identify model parameters that explain the underlying network dynamics. We tested our proposed method using several case studies of neuronal dynamics, where we modeled the macroscopic oscillation of a population of neurons using the extended Stuart-Landau (SL) equation and utilize the SINDyG method to identify the underlying nonlinear dynamics. Our extensive computational experiments validate the improved accuracy and simplicity of discovered network dynamics when compared to the original SINDy approach. The proposed graph-informed penalty can be easily integrated with other symbolic regression algorithms, enhancing model interpretability and performance by incorporating network structure into the regression process.

URL PDF HTML ☆

赞 0 踩 0

math/0402063 2026-05-12 math.CO

Lattice congruences, fans and Hopf algebras

Nathan Reading

AI总结本文研究了弱序格同余在Coxeter群中的几何与代数性质，提出了一种统一的解释，并推广了从排列到三角剖分和子集的映射。通过构造与格商相关的完整扇形，建立了与非交换对称函数Hopf代数相关的子Hopf代数，并利用模式避免描述其基。研究还表明，Malvenuto-Reutenauer代数可以视为一系列更小代数的极限，并与Baxter排列数量相等的排列集建立了联系。

Comments 34 pages, 1 figure. Version 2: Very belatedly updating the arXiv version to agree with the last pre-publication version

math/0209080 2026-05-12 math.RA

Bounds for the Entropy of Graded Algebras

Jan Snellman

AI总结本文研究了分级结合代数的熵的上界问题，定义熵为各齐次分量维数的几何平均的极限上确界。作者在前人结果的基础上，利用Friedland关于0-1矩阵最大谱半径的结论，改进了自由结合代数同余商代数熵的上界估计，给出了更精确的界。

Comments 4 pages, 1 figure

2605.10941 2026-05-12 cs.CC

Average-Case Hardness of Binary-Encoded Clique in Proof and Communication Complexity

Susanna F. de Rezende, David Engström, Yassine Ghannane, Duri Andrea Janett, Artur Riazanov

AI总结本文研究了在平均情况下，证明图中不存在大团问题在证明复杂度和通信复杂度中的困难性。通过分析随机采样的稠密图的二进制编码团问题，作者证明了切割平面和有限深度的模2解证法的下界为指数级，并指出在这些公式中寻找被违反子句的随机通信复杂度为多项式级。这一结果揭示了在平均情况下，这类问题在不同计算模型中表现出显著的难度差异。

Comments Full version of a paper to appear at ICALP 2026

2605.10940 2026-05-12 astro-ph.HE

Electromagnetic Follow-up of the Sub-Solar Mass Gravitational Wave Candidate S251112cm: Kilonova Constraints and a Coincident IIb Supernova

Xander J. Hall, Tomas Ahumada, Julius Gassert, Antonella Palmese, Brian D. Metzger, Mansi M. Kasliwal, Mattia Bulla, Daniel Gruen, Robert Stein, Christoffer Fremling, Shreya Anand, Igor Andreoni, Malte Busmann, Tomás Cabrera, Ryan Christinzio, James Freeburn, Ignacio Magaña Hernandez, Lei Hu, Brendan O'Connor, Ji-an Jiang, Zhengyan Liu, Wen Zhao, Eric C. Bellm, David Cook, Michael W. Coughlin, Richard Dekany, Matthew Graham, Russ R. Laher

AI总结 2025年11月12日，LIGO-Virgo-KAGRA合作组探测到一个包含至少一个亚太阳质量天体的致密天体合并引力波候选事件S251112cm。研究团队利用多个望远镜在引力波警报后2.4小时内对56%的定位区域进行了电磁波后随观测，未发现千新星对应体，但发现了一个IIb型超新星SN 2025adtq，其爆发时间比引力波事件早约2天，空间关联性较强。该结果为亚太阳质量中子星合并的“超千新星”形成机制提供了部分支持，但证据尚不充分。

2605.10939 2026-05-12 math.MG math.PR

Dimension-free Gaussian tail estimates for linear functionals on convex bodies

Brayden Letwin, Dan Mikulincer

AI总结本文研究凸体上均匀分布随机向量的线性泛函的高斯尾部估计问题。作者证明了存在一个绝对常数和一个足够大的正交向量集合，使得对于该集合中的每个向量，对应的线性泛函的 $L^p$ 范数在 $p$ 的范围内与 $L^2$ 范数成比例，从而给出了维度无关的高斯尾界估计。这一结果为理解高维凸体的几何与概率性质提供了新的分析工具。

Comments 18 pages

2605.10935 2026-05-12 astro-ph.CO

Demonstrating the Use of the Spherical Fourier Bessel Basis for Large Scale Clustering Systematics Discovery and Mitigation with eBOSS

Sean Bruton, James R. Cheshire, Olivier Doré, Henry S. Grasshorn Gebhardt, Robin Y. Wen

AI总结本文研究了如何利用球面傅里叶-贝塞尔（SFB）基函数在大尺度结构巡天中识别和缓解系统误差。通过分析eBOSS DR16中的LRG和QSO样本，作者展示了SFB基在分离角向和径向模态方面的优势，并发现了可能受系统误差影响的模态。研究在QSO样本中发现了大尺度物理结构中存在系统误差的显著证据，可能源于残余恒星污染，并在LRG和QSO样本中检测到了与角向和成像尺度相关的未知系统误差。

Comments 17 pages, 12 figures, 1 appendix

2605.10932 2026-05-12 quant-ph cond-mat.mes-hall

Crystallographic Symmetry Generates Phononic Holonomic Gates with Biased-Erasure Channels

El Mustapha Mansouri, Keigo Arai

AI总结该研究利用晶体对称性设计了一种基于声子操控的量子门，通过应变调控实现高保真度的逻辑操作。核心方法是在具有特定对称性的Λ型能级结构中，利用机械模态合成圆形应变场，结合超绝热回波-月牙形拓扑门技术，实现无需局部微波场的控制。研究展示了在氮空位中心上的模拟结果，达到接近99.9%的条件平均保真度，并提出了一种具有偏差擦除特性的噪声模型，为量子纠错提供了更高效的错误识别与处理机制。

Comments 49 pages, 20 figures

详情

英文摘要

Solid-state processors require control layers whose errors are legible to quantum-error-correction decoders. We show that crystallographic symmetry can provide such a layer in strain-active Lambda manifolds. When the projected strain tensor and Lambda-transition operators share a multiplicity-one two-dimensional irreducible representation, symmetry fixes the linear strain interaction to a scalar dot product. Two phase-locked mechanical modes synthesize a circular strain field, enabling complex phononic Lambda-leg control without local microwave near fields. On this manifold we construct a superadiabatic echo-lune holonomic gate using Lambda-leg control and a resonant double-quantum counterdiabatic tone. Rotating-frame simulations of a nitrogen-vacancy center give 99.88% conditional average fidelity in 1.833 microseconds, or 99.40% when leakage is counted as error. A resonant gigahertz high-overtone bulk acoustic resonator analysis translates the Hamiltonian into Rabi-rate, linewidth, and envelope-tracking requirements. The bright-state structure organizes noise: A2-sector perturbations are parity-filtered into an optically distinguishable auxiliary state, whereas transverse E-sector faults are echo suppressed and retained as a decoder stress axis. The extracted channel has 0.47% erasure probability and 0.168% residual Z error. In XZZX code-capacity simulations, this biased-erasure model yields a nominal 64% fit-extrapolated data-qubit reduction relative to an unstructured Rabi baseline. Repeated-round detector-model diagnostics preserve the nominal distance-9 proxy and identify missed erasures, transverse floors, leakage/flag timing, and strong crosstalk as validation limits. Extensions to orbital Lambda systems and bright-projector phonon-bus diagnostics identify crystallographic symmetry as a principle for co-designing phononic actuation, leakage, noise bias, and quantum decoding.

URL PDF HTML ☆

赞 0 踩 0

2605.10929 2026-05-12 math.NA cs.NA

Efficient Admissible Set Projection in Optimization-based Invariant-Domain-Preserving Limiters for Ideal MHD

Chen Liu, Chi-Wang Shu, Xiangxiong Zhang

AI总结本文研究了在理想磁流体动力学（MHD）方程的优化型不变域保持限制器中，如何高效地进行可接受集投影的问题。为实现物理合理且计算稳健的数值解，作者提出了一种基于优化的限制器，在保持全局守恒和精度的同时确保解的可接受性。通过将可接受集按磁能参数化为切片，将高维投影问题简化为一维最小化问题，从而高效求解，并结合分裂方法与Zhang-Shu限制器进一步提升计算效率与精度。

2605.10928 2026-05-12 astro-ph.CO

Mitigating residual foregrounds and systematic errors in SKA1-Low AA* EoR observations via Bayesian Gaussian Process Regression

Samit Kumar Pal, Abhirup Datta, Aishrila Mazumder, Anshuman Tripathi

AI总结本文研究了如何通过贝叶斯高斯过程回归（GPR）方法，有效抑制SKA1-Low观测中来自天体物理前景和系统误差的残余干扰，以提高宇宙再电离时期21厘米信号的检测精度。研究基于自主研发的端到端仿真平台，模拟了包括天体点源、天线增益校准误差、电离层相位误差等多种因素的复杂观测环境，并评估了不同GPR框架在抑制残余前景、减少信号损失和提供可靠误差估计方面的能力。结果表明，该方法能够在较宽的波数范围内可靠地恢复21厘米信号，精度达到2σ置信区间。

Comments 35 pages, 13 figures, comments are welcome, To be submitted to JCAP

2605.10927 2026-05-12 cs.DS

Chasing Small Sets Optimally Against Adaptive Adversaries

Christian Coester, Alexa Tudose

AI总结本文研究了在度量空间中确定性在线算法追踪至多 $k$ 个元素集合的问题，该问题也被称为度量服务系统或宽度-$k$ 分层图遍历。作者提出了一种 $O(2^k)$ 竞争比的确定性算法，填补了该问题长达三十年的理论空白，并证明这一界在对抗自适应对手的随机化算法中也是最优的。此外，作者还改进了确定性下界，并针对 $k=3$ 的情况给出了匹配的上界，相关结果对分布式异步树探索和 $k$-出租车问题也具有重要意义。

Comments 32 pages

2605.10926 2026-05-12 math.CO

Counting Spinal Tree-Child Networks via Word Encodings and Generating Functions

Pau Vives, Anna de Mier, Gabriel Cardona, Joan Carles Pons

AI总结本文研究了一类称为脊椎树-子进化网络的结构，这类网络具有严格的结构特征，即所有内部节点都位于从根到叶的单条路径上。作者提出了两种互补的组合方法：一种是引入词模型，将未标记的脊椎网络对应到具有固定重复次数的受限词类，并通过简单重标记等价关系实现显式的计数公式；另一种是基于标记树的符号方法，通过递归定义得到可解的双变量生成函数，从而直接推导出网络的数量系数。这些方法为脊椎树-子网络的枚举提供了理论基础和计算工具。

Comments 27 pages, 6 figures

2605.10920 2026-05-12 cs.SE

Using Logs to support Programming Education

Gilmar Gomes do Nascimento, Maria Claudia F. P Emer, Adolfo Gustavo Serra Seca Neto, Laudelino Cordeiro Bastos

AI总结本研究旨在通过分析编程学习过程中的实时代码日志，弥补编程教育中缺乏量化评估的不足。研究提出了一种新型插件，用于广泛使用的代码编辑器，以记录学生在编程和文档编写过程中的细粒度交互行为，生成包含错误、进度和时间戳的详细数据集。该方法为教育者提供了基于数据的学习分析工具，支持教学方法研究、学习难点识别和个性化教学改进，推动编程教育向数据驱动和实证化方向发展。

Comments Author version of the paper accepted for publication at XX Conferência Latino-Americana de Tecnologias de Aprendizagem - LACLO 2025

2605.10918 2026-05-12 physics.atm-clus physics.chem-ph

Single-Photon Double Ionization of Ozone

Veronica Daver Ideböhn, Antoine Gloriod, Richard J. Squibb, Andreas Hult Roos, Nihar Ranjan Behera, Ishita Kanungo, Elias Gustafsson, Simon Gällblad, Saga Berglund, Emelie Olsson, Gunnar Öhrwall, Gunnar Nyman, John M. Dyke, John H. D. Eland, Majdi Hochlaf, Raimund Feifel

AI总结该研究探讨了臭氧（O₃）在单光子双电离过程中的电子结构与解离动力学。通过结合高能真空紫外光源与多电荷粒子关联检测技术，首次获得了臭氧的单光子价电子双电离电子能谱。研究还利用多组态相互作用方法精确计算了双电离臭氧的势能面及解离通道能量，揭示了臭氧双电离过程中除基态解离外还存在激发态氧离子的生成，展现出更为复杂的解离动力学行为。

2605.10915 2026-05-12 math.ST stat.TH

A Generative High Quantile Homogeneity Test Using Bahadur Representation for Heteroskedastic High Quantile Regression of Tail Dependent Time Series

Ting Zhang, Fangwei Wu, Jingying Gao

AI总结本文研究了在尾部依赖时间序列的异方差高分位数回归中，解释变量对响应变量不同高分位数的影响是否具有同质性的问题。为此，作者提出了一种基于Bahadur表示的新型高分位数同质性检验方法，该方法适用于异方差情形，并能够将非线性高分位数回归估计问题转化为具有显式误差界的线性形式问题。该方法不仅为高分位数回归提供了理论基础，还在实际数据中的应用展示了其有效性。

Comments 31 pages, 1 figure

2605.10914 2026-05-12 stat.CO

gemlib.mcmc: composable kernels for Metropolis-within-Gibbs sampling schemes

Alin Morariu, Jess Bridgen, Chris Jewell

AI总结该研究针对流行病学和生态学中状态转移模型的统计推断难题，提出了一种名为 gemlib.mcmc 的 MCMC 模块，旨在简化 Metropolis-within-Gibbs 采样方案的实现。通过引入范畴论中的 writer monad，该框架实现了参数估计与数据增强核的可组合性，无需手动管理状态，从而提升了代码的可扩展性与复用性。基于 JAX 和 TensorFlow Probability，gemlib.mcmc 提供了高效且易用的接口，使复杂推断算法能够简洁表达并跨应用复用，降低了实现门槛，推动了方法研究与实际应用的结合。

详情

英文摘要

State-transition models are essential across epidemiology and ecology, but statistical inference remains challenging owing to high-dimensional latent state spaces, temporal dependence, and intractable likelihood functions. Bayesian inference via Markov Chain Monte Carlo (MCMC) enables joint estimation of model parameters and missing event times through data augmentation, but Metropolis-within-Gibbs (MWG) schemes that combine multiple specialised kernels are notoriously difficult to implement. Current probabilistic programming frameworks face a trade-off: automation sacrifices extensibility, whilst flexibility demands substantial implementation overhead. This divide has created a software landscape characterised by tightly coupled, model-specific implementations that resist reuse and extension. We introduce gemlib.mcmc, an MCMC module designed to bridge methodological and applied communities through principled, composable kernel abstractions. The framework employs writer monads from category theory to formalise kernel composition, enabling seamless integration of parameter-estimation and data-augmentation kernels without manual state management. Built on JAX and TensorFlow Probability for high-performance computation, gemlib.mcmc provides an ergonomic interface -- leveraging Python's right-shift operator for intuitive kernel chaining -- whilst maintaining statistical rigour and transparency. Developers can extend the library by implementing only two methods; composition and hardware acceleration are automated. We demonstrate the framework through parameter inference on partially observed epidemic models, showing how complex inference algorithms can be expressed concisely and reused across applications. By reducing implementation burden we provide access to sophisticated MCMC methods and enable applied researchers to employ state-of-the-art algorithms without reimplementation overhead.

URL PDF HTML ☆

赞 0 踩 0

2605.10911 2026-05-12 math.PR cs.CC cs.DS math.CO math.ST stat.TH

The stochastic block model has the overlap graph property for modularity

Shankar Bhamidi, David Gamarnik, Remco van der Hofstad, Nelly Litvak, Pawel Pralat, Fiona Skerman, Yasmin Tousinejad

AI总结本文研究了随机块模型（SBM）中基于模块度的聚类算法的理论极限，指出模块度在SBM中具有重叠间隙性质（OGP）。这一性质表明，基于模块度的局部算法在恢复隐藏的社区结构时存在困难，并且相关马尔可夫链的混合时间较慢。该研究扩展了Bickel和Chen的结论，证明了在高概率下，任何接近最优模块度的划分都与隐藏的社区划分接近，为理解SBM中算法性能的瓶颈提供了理论依据。

Comments 28 pages, 2 figures