arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2606.08554 2026-06-09 cs.LG 新提交

A Theoretical Analysis of Memory and Overfitting Phenomena in Stochastic Interpolation Models

随机插值模型中的记忆与过拟合现象的理论分析

Yunchen Li, Shaohui Lin, Zhou Yu

AI总结本文通过闭式解分析随机插值模型中的记忆化现象，揭示连续时间下确定性及随机生成过程均恢复训练样本，离散化与估计误差导致样本偏离，并给出过拟合与欠拟合的理论定义。

详情

AI中文摘要

本文对随机插值模型中的记忆化现象进行了理论解释。通过利用最优速度场和相关评分函数的闭式表达式，我们证明，在连续时间预言机设置下，确定性和随机生成过程都能恢复训练样本。在欧拉离散化下，生成的样本仍围绕训练样本中心，偏差由步长控制。我们进一步分析了存在估计误差时的生成过程，并表明累积的估计误差控制了端点与训练集的偏差。这些结果表明，生成的样本可以表示为训练样本加上三个受控项的扰动：离散化引起的界、估计误差引起的界和随机高斯噪声。基于这一表征，我们提供了生成模型中过拟合和欠拟合的理论定义。合成模拟支持了我们的理论发现。

英文摘要

This paper provides a theoretical account of memorization in stochastic interpolation models. By leveraging closed-form expressions for the optimal velocity field and the associated score function, we show that, in the continuous-time oracle setting, both deterministic and stochastic generation processes recover training samples. Under Euler discretization, generated samples remain centered around training samples, with deviations controlled by the step size. We further analyze generation in the presence of estimation errors and show that accumulated estimation errors control the endpoint deviation from the training set. These results imply that the generated sample admits a representation as a training sample perturbed by three controlled terms: a discretization-induced bound, an estimation-error-induced bound, and stochastic Gaussian noise. Based on this characterization, we provide theoretical definitions of overfitting and underfitting in generative models. Synthetic simulations support our theoretical findings.

URL PDF HTML ☆

赞 0 踩 0

2606.08491 2026-06-09 cs.AI 新提交

What Makes a Desired Graph for Relational Deep Learning?

什么构成了关系深度学习的理想图？

Yao Cheng, Siqiang Luo

AI总结研究发现，从数据库模式直接导出的图存在信息过载和语义碎片化问题，通过过滤和注入操作平衡可提升性能，并开发了自动优化器。

Comments This article has been accepted by ICML 2026

详情

AI中文摘要

关系深度学习（RDL）将关系数据库（RDB）转换为异构图，但直接从数据库模式导出的图通常不适合图神经网络（GNN）进行关系推理的方式。我们研究了什么使关系图适合深度学习，并表明模式派生图存在两个系统性失败：信息过载和语义碎片化。我们的实证分析表明，理想的图不是原始模式，而是受控结构适应的结果。性能取决于平衡两种操作：通过过滤减轻信息过载，以及通过注入修复语义碎片。具体而言，过滤作为具有非单调效应的偏差-方差旋钮，而注入仅在明确恢复原始模式中缺失的关系依赖时才能提高性能。基于这些发现，我们开发了一个端到端结构优化器，应用这两种操作自动适应关系图。在涵盖分类、回归和推荐的26个任务中，优化后的图在通常降低推理成本的同时持续提高了准确性。

英文摘要

Relational deep learning (RDL) converts relational databases (RDBs) into heterogeneous graphs, but graphs derived directly from database schemas are often not well suited for how graph neural networks (GNNs) perform relational reasoning. We study what makes a relational graph suitable for deep learning and show that schema-derived graphs suffer from two systematic failures: information overload and semantic fragmentation. Our empirical analysis reveals that the desired graph is not the raw schema, but a result of controlled structural adaptation. Performance depends on balancing two operations: mitigating information overload via filtering, and repairing semantic fragmentation via injection. Specifically, filtering serves as a bias-variance knob with non-monotonic effects, while injection improves performance only when it explicitly restores the relational dependencies missing from the original schema. Based on these findings, we develop an end-to-end structural optimizer that applies both operations to adapt relational graphs automatically. Across 26 tasks spanning classification, regression, and recommendation, the optimized graphs consistently improve accuracy while often reducing inference cost.

URL PDF HTML ☆

赞 0 踩 0

2606.08417 2026-06-09 cs.CL cs.AI 新提交

Hacking Generative Perplexity: Why Unconditional Text Evaluation Needs Distributional Metrics

破解生成困惑度：为何无条件文本评估需要分布度量

Antonio Franca, Alexander Tong

AI总结本文指出生成困惑度（gen-PPL）作为非自回归语言模型评估指标存在缺陷，通过构造零参数朴素采样器在LM1B和OpenWebText上达到SOTA gen-PPL但生成不连贯文本，建议采用直接量化生成文本与参考文本分布差异的评估套件。

Comments Accepted to the Workshop on Structured Probabilistic Inference & Generative Modeling (SPIGM) at ICML 2026

详情

AI中文摘要

扩散和连续流语言模型已成为语言建模中领先的非自回归替代方案。这两种范式的进展主要通过生成困惑度（gen-PPL）来衡量：在冻结的自回归（AR）评分器（如gpt2-large）下，样本的每个token的负对数似然，通常配以经验熵护栏来排除低熵崩溃。我们认为该度量不健全。从构造上看，gen-PPL仅衡量在评分AR下的可预测性，而非语法性或语义连贯性——而可预测但低质量的序列集合在组合上非常庞大。为了具体说明这一点，我们构建了一套零参数、故意朴素的采样器，在LM1B和OpenWebText上以非退化熵实现了最先进的gen-PPL，超越了最近发布的扩散和连续流模型，同时生成的文本在构造上是不连贯的。我们推荐直接量化生成文本与参考文本之间分布差异的评估套件，并使用这样的套件重新基准测试最近的非自回归模型，从而更真实地反映当前的最新技术水平。

英文摘要

Diffusion and continuous flow-based language models have emerged as the leading non-autoregressive alternatives to language modeling. Progress in both paradigms is overwhelmingly tracked by generative perplexity (gen-PPL): the per-token negative log-likelihood of samples under a frozen autoregressive (AR) scorer such as gpt2-large, typically paired with an empirical-entropy guardrail to rule out low-entropy collapse. We argue that this metric is unsound. By construction, gen-PPL measures only predictability under the scoring AR, not grammaticality or semantic coherence -- and the set of predictable but still low-quality sequences is combinatorially large. To make this concrete, we construct a suite of zero-parameter, deliberately naive samplers that achieve state-of-the-art gen-PPL on LM1B and OpenWebText at non-degenerate entropy, surpassing recently published diffusion and continuous-flow models while producing text that is incoherent by construction. We recommend evaluation suites that directly quantify the distributional divergence between generated and reference text, and use such a suite to re-benchmark recent non-autoregressive models, recovering a more faithful picture of the current state of the art.

URL PDF HTML ☆

赞 0 踩 0

2606.08306 2026-06-09 cs.LG cs.SI 新提交

Towards Graph Foundation Models for Dynamics in Complex Networked Systems: Lessons from Super-Spreader Identification in Multilayer Networks

面向复杂网络系统中动力学的图基础模型：来自多层网络超级传播者识别的教训

Michał Czuba, Mateusz Stolarski, Adam Piróg, Piotr Bielak, Piotr Bródka

AI总结本文提出图基础模型在动力学中需具备归纳跨网络泛化能力，通过仅基于合成多层网络训练的ts-net模型，在真实多层网络上实现零样本泛化，并优于传统方法。

2606.08300 2026-06-09 cs.LG 新提交

QueryWeaver: Reliable Multi-Tool Query Execution Planning via LLM-Based Graph Generation

QueryWeaver: 基于LLM图生成的可靠多工具查询执行规划

Aishwarya Chakravarthy, Vidhi Kulkarni, Duen Horng Chau

AI总结提出将自然语言查询转换为结构化图并通过确定性规划器执行的系统，利用深度优先搜索解决跨工具依赖，实现高可靠性查询。

2606.08285 2026-06-09 cs.AI cs.CE q-fin.CP q-fin.TR 新提交

Beyond Agent Architecture: Execution Assumptions and Reproducibility in LLM-Based Trading Systems

超越智能体架构：基于LLM的交易系统中的执行假设与可复现性

Junyi Yao, Zihao Zheng

AI总结本文通过审计30项相关研究，发现LLM交易研究中执行假设报告不足，导致结果难以比较，提出需建立执行现实性、可复现性和评估可比性的报告标准。

详情

AI中文摘要

大型语言模型（LLM）和智能体系统越来越多地被用于金融交易，但其报告的性能仍然难以比较，因为研究在数据来源、时间分割纪律、执行时机、周转处理和交易成本建模方面存在差异。本文对基于LLM的交易研究中的执行现实性进行了有针对性的主题回顾和可复现性审计。一个包含30项交易相关主要研究的编码证据矩阵用于评估时点控制、分割透明度、保留评估、成本和周转处理、执行语义、宇宙定义和工件发布。在审计样本中，架构报告通常比判断交易结果是否经济可解释或可复现所需的评估假设更清晰。一个包含10只股票的工作示例仅作为方法学框架，以说明明确的摩擦和时机选择如何实质性地压缩主动策略结果。主要结论是，LLM交易研究的下一步有用进展不仅是更好的智能体设计，还包括更清晰的执行现实性、可复现性和评估可比性的报告标准。

英文摘要

Large language models (LLMs) and agentic systems are increasingly proposed for financial trading, yet their reported performance remains difficult to compare because studies vary in data provenance, temporal split discipline, execution timing, turnover treatment, and transaction-cost modeling. This article presents a targeted topical review and reproducibility audit of execution realism in LLM-based trading research. A coded evidence matrix covering 30 trade-relevant primary studies is used to assess point-in-time controls, split transparency, held-out evaluation, cost and turnover treatment, execution semantics, universe definition, and artifact release. Across the audited sample, architecture reporting is generally clearer than the evaluation assumptions needed to judge whether a trading result is economically interpretable or reproducible. A 10-equity worked example is included only as a methodological scaffold to illustrate how explicit friction and timing choices can materially compress active-strategy results. The main conclusion is that the next useful step for LLM trading research is not only better agent design, but also clearer reporting standards for execution realism, reproducibility, and evaluation comparability.

URL PDF HTML ☆

赞 0 踩 0

2606.08158 2026-06-09 cs.CL cs.AI 新提交

Constrained Paraphrase Consistency for LLM Hallucination Detection

约束释义一致性用于大语言模型幻觉检测

Shanshan Lin, Dongsheng Hong, Sibo Ju, Chao Chen, Xi Zhang, Xiangwen Liao

AI总结提出约束一致性幻觉检测器(CCHD)，通过约束优化利用释义一致性，无需额外数据，在多个基准上超越现有方法。

Comments Accepted to ICASSP 2026

详情

AI中文摘要

大型语言模型（LLM）可能生成事实不一致的声明，这促使需要准确且可扩展的幻觉检测器。先前的工作主要通过合成或新标注来扩大训练集，这增加了成本和潜在偏差，同时未充分利用语义等价释义所隐含的一致性。我们提出约束一致性幻觉检测器（CCHD），将训练形式化为约束优化问题。在原始文档-声明对上的标准交叉熵基础上，补充了（i）释义一致性约束，限制不同释义视图之间的差异，以及（ii）标签保持约束，将释义与真实标签绑定。我们通过模型参数和每个视图的拉格朗日乘子的梯度下降-上升法求解该问题，仅增加少量标量对偶变量，且无推理时开销。使用DeBERTa和Flan-T5骨干网络，CCHD在标准事实性基准上持续优于强基线（FactCG、MiniCheck和AlignScore），展示了其在幻觉检测上的优越性。

英文摘要

Large language models (LLMs) can generate factually inconsistent claims, motivating accurate and scalable hallucination detectors. Prior work largely enlarges training sets via synthesis or new annotations, introducing increasing cost and potential bias while underusing the consistency implied by semantically equivalent paraphrases. We propose Consistency-Constrained Hallucination Detector (CCHD), which formulates training as a constrained optimization problem. The standard cross-entropy on original document-claim pairs is complemented by (i) paraphrase-consistency constraints bounding divergence across paraphrased views, and (ii) label-preservation constraints tying paraphrases to ground truth. We solve the problem by gradient descent-ascent over model parameters and per-view Lagrange multipliers, adding only a few scalar dual variables and no inference-time overhead. With DeBERTa and Flan-T5 backbones, CCHD consistently outperforms strong baselines (FactCG, MiniCheck, and AlignScore) on standard factuality benchmarks, demonstrating its superiority on hallucination detection.

URL PDF HTML ☆

赞 0 踩 0

2606.08157 2026-06-09 cs.CL 新提交

Cross Paraphrastic Invariance Learning for Hallucination Detection

跨释义不变性学习用于幻觉检测

Shanshan Lin, Dongsheng Hong, Sibo Ju, Chao Chen, Sihong Xie, Xiangwen Liao

AI总结提出CPIL框架，通过构建正负样本对进行两阶段对比学习，仅用1%标注数据即在11个任务上超越基线，高效检测LLM幻觉。

Comments Accepted to ICASSP 2026

详情

AI中文摘要

大型语言模型（LLM）经常生成缺乏源文档支持的幻觉。为避免昂贵的LLM评估流水线和现有分类器的大量标注需求，我们提出CPIL（跨释义不变性学习），一个两阶段孪生框架，最大化利用现有标注数据。具体地，CPIL通过以下方式构建信息丰富的训练对：（i）为每个文档-声明示例生成释义视图作为正样本，并显式对齐其表示以强制对表面形式的不变性；（ii）挖掘同文档、异标签对作为难负样本，以锐化文档敏感的决策边界。然后CPIL进行两阶段模型训练：第一阶段进行对比预训练，学习释义不变、基于事实的嵌入空间；第二阶段附加轻量级分类器进行二元事实性判断。在LLM-AggreFact基准（11个任务）上，CPIL仅用约1%的标注数据即在F1分数上超越强基线，展示了其预测优越性和标签效率。

英文摘要

Large language models (LLMs) frequently generate hallucinations, which are unsupported by a source document. To avoid costly LLM-as-evaluator pipelines and the heavy annotation demands of existing classifiers, we propose CPIL (Cross Paraphrastic Invariance Learning), a two-stage Siamese framework that maximizes the utility of existing labeled data. Concretely, CPIL constructs informative training pairs by: (i) generating paraphrastic views of each document-claim example as positives, and explicitly aligning their representations to enforce invariance to surface form; and (ii) mining same-document, opposite-label pairs as hard negatives to sharpen document-sensitive decision boundaries. Then CPIL conduct a two-stage model training: Stage 1 performs contrastive pretraining to learn a paraphrase-invariant, grounding-aware embedding space; and Stage 2 attaches a lightweight classifier for binary groundedness. On the LLM-AggreFact benchmark (11 tasks), CPIL surpasses strong baselines concerning F1 scores with only ~1% labeled data, showing its prediction superiority and label efficiency.

URL PDF HTML ☆

赞 0 踩 0

2606.07904 2026-06-09 cs.AI cs.SE 新提交

Contract2Tool: Learning Preconditions and Effects for Reliable Tool-Augmented LLM Agents

Contract2Tool: 学习可靠工具增强型LLM代理的前提条件和效果

Rahul Suresh Babu, Laxmipriya Ganesh Iyer

AI总结提出Contract2Tool框架，从元数据、文档和执行轨迹中推断工具契约，实现因果工具过滤，在保持可靠性的同时大幅减少工具选择和token使用。

详情

AI中文摘要

工具增强型大语言模型代理越来越依赖外部API，但标准工具模式描述的是如何调用工具，而非工具何时因果合适或产生何种任务状态。因果工具过滤通过使用轻量级契约来弥补这一差距，这些契约指定了每个工具的前提条件、效果、风险级别和成本。然而，手动编写和维护此类契约无法扩展到大型或变化的工具生态系统。我们引入了Contract2Tool，这是一个从元数据、模式、文档和执行轨迹中推断工具契约的框架。Contract2Tool将可观察的工具证据转换为标准化的符号契约，这些契约可以在内部评估并部署到下游的因果工具过滤中。我们根据黄金标准的前提条件、效果和风险标签评估学习到的契约，并测量它们在多步代理任务中的下游效用。我们的结果表明，混合文档和轨迹证据产生的契约足够准确，可以保留黄金契约的大部分可靠性和效率优势。学习契约的CMTF实现了0.980的下游成功率，接近黄金契约CMTF的0.990，同时将可见工具从100个减少到1个，并将平均token使用量从26,172减少到2,528（相对于所有工具暴露）。这些结果表明，学习到的契约可以在工具模式和可靠代理执行之间提供可扩展的契约层。

英文摘要

Tool-augmented large language model agents increasingly rely on external APIs, but standard tool schemas describe how to call a tool, not when the tool is causally appropriate or what task state it produces. Causal tool filtering addresses this gap by using lightweight contracts that specify each tool's preconditions, effects, risk level, and cost. However, manually writing and maintaining such contracts does not scale to large or changing tool ecosystems. We introduce Contract2Tool, a framework for inferring tool contracts from metadata, schemas, documentation, and execution traces. Contract2Tool converts observable tool evidence into normalized symbolic contracts that can be evaluated intrinsically and deployed inside downstream causal tool filtering. We evaluate learned contracts against gold preconditions, effects, and risk labels, and measure their downstream utility on multi-step agent tasks. Our results show that hybrid documentation-and-trace evidence produces contracts accurate enough to preserve most of the reliability and efficiency benefits of gold contracts. Learned-contract CMTF achieves 0.980 downstream success, close to 0.990 for gold-contract CMTF, while reducing visible tools from 100 to 1 and reducing average token usage from 26,172 to 2,528 relative to all-tools exposure. These results suggest that learned contracts can provide a scalable contract layer between tool schemas and reliable agent execution.

URL PDF HTML ☆

赞 0 踩 0

2606.07629 2026-06-09 cs.LG cs.AI cs.CL cs.CY cs.HC 新提交

Large Language Models Should Learn Personalized Rather Than Aggregated Human Preferences

大型语言模型应学习个性化而非聚合的人类偏好

Cristina Garbacea

AI总结本文主张大型语言模型应学习个性化偏好而非聚合偏好，分析聚合偏好的理论局限与实证问题，提出通过有界个性化框架兼顾个体自主与集体安全。

Comments Accepted to ICML 2026

详情

AI中文摘要

当前对齐大型语言模型（LLM）的方法将多样化的人类偏好聚合为单一奖励信号，实际上优化了一个不代表任何真实个体的假设性“平均用户”。本文立场论文认为，LLM应学习个性化、个体化的偏好而非聚合偏好。我们表明，聚合掩盖了关于偏好多样性、个体价值观和上下文依赖的关键信息，这在理论上基于社会选择理论，并在经验上跨人口群体明显。我们分析了人类偏好编码的丰富结构，调查了个性化的技术方法，并系统地回应了关于可扩展性、共享标准和操纵风险的反驳。虽然个性化引入了真正的安全挑战，包括过滤气泡、价值锁定和心理操纵，但我们认为这些挑战可以通过有界个性化框架来管理，该框架在容纳合法个体差异的同时保留通用安全约束。最后，我们提出了一个具体的研究和政策议程，以开发尊重个体自主和集体安全的偏好感知模型。

英文摘要

Current approaches to aligning large language models (LLMs) aggregate diverse human preferences into a single reward signal, effectively optimizing for a hypothetical ``average user'' who represents no real person particularly well. This position paper argues that LLMs should learn personalized, individual preferences rather than aggregated ones. We show that aggregation masks critical information about preference diversity, individual values, and contextual dependencies, which is a limitation both theoretically grounded in social choice theory and empirically evident across demographic groups. We analyze the rich structure that human preferences encode, survey technical approaches to personalization, and systematically address counterarguments on scalability, shared standards, and manipulation risk. While personalization introduces genuine safety challenges including filter bubbles, value lock-in, and psychological manipulation, we argue these are manageable through bounded personalization frameworks that preserve universal safety constraints while accommodating legitimate individual variation. We conclude with a concrete research and policy agenda for developing preference-aware models that respect both individual autonomy and collective safety.

URL PDF HTML ☆

赞 0 踩 0

2606.07616 2026-06-09 cs.LG cs.AI cs.CL 新提交

Item Response Scaling Laws: A Measurement Theory Approach for Efficient and Generalizable Neural Scaling Estimation

项目反应缩放定律：一种高效且可泛化的神经缩放估计的测量理论方法

Sang Truong, Yuheng Tu, Rylan Schaeffer, Sanmi Koyejo

AI总结提出项目反应缩放定律（IRSL），将项目反应理论融入缩放定律框架，通过Beta-IRT模型利用语言模型的概率响应，将参数复杂度从O(M×N)降至O(M+N)，在预训练和测试时缩放场景中仅用50个问题即可实现可靠估计。

详情

AI中文摘要

缩放定律为理解语言模型（LM）的性能提供了基本框架，但推导它们需要在数千个检查点或数百万个推理样本上进行成本高昂的评估。为了解决这个问题，我们引入了项目反应缩放定律（IRSL），这是一个将项目反应理论（IRT）整合到缩放定律框架中的统一框架。与将每个模型-基准对单独处理的传统方法不同，IRSL将潜在模型能力与问题特征分离，将M个模型和N个问题的缩放定律估计分解，从而将参数复杂度从O(M×N)显著降低到O(M+N)。我们使用Beta-IRT实例化IRSL，它利用LM的经验概率响应——例如预训练中的token概率和测试时采样中的通过率——来捕获比二元响应更丰富的信号。我们在两种常见的缩放范式上验证了我们的方法：（1）预训练下游缩放，使用来自10个基准的6,612个LM检查点和37,682个问题；以及（2）测试时缩放，使用来自4个基准的12个LM和120个问题，每个问题最多2,500个样本。在现有模型响应上进行一次性校准后，IRSL仅使用每个基准50个问题（减少99.9%）即可产生更可靠的缩放估计，达到与传统方法相当或更优的决策准确性。此外，我们表明估计的潜在模型能力是可泛化的，从而能够跨共享相同测量目标的基准进行准确的性能预测。

英文摘要

Scaling laws provide a fundamental framework for understanding the performance of Language Models (LMs), yet deriving them requires prohibitively expensive evaluations across thousands of checkpoints or millions of inference samples. To address this, we introduce Item Response Scaling Laws (IRSL), a unified framework that integrates Item Response Theory (IRT) within the scaling law framework. Unlike traditional approaches that treat each model-benchmark pair in isolation, IRSL disentangles latent model ability from question characteristics, factorizing the scaling law estimation for $M$ models and $N$ questions to significantly reduce parameter complexity from $O(M \times N)$ to $O(M + N)$. We instantiate IRSL with Beta-IRT, which leverages the empirical probability responses of LMs -- such as token probabilities in pre-training and pass rates in test-time sampling -- to capture richer signals than binary responses. We validate our approach across two prevalent scaling paradigms: (1) pre-training downstream scaling, using 6,612 LM checkpoints and 37,682 questions from 10 benchmarks; and (2) test-time scaling, using 12 LMs and 120 questions from 4 benchmarks with up to 2,500 samples per question. Given a one-time calibration on existing model responses, IRSL yields more reliable scaling estimates using only 50 questions per benchmark (a 99.9\% reduction), achieving comparable or superior decision accuracy to traditional approaches. Furthermore, we show that the estimated latent model abilities are generalizable, enabling accurate performance forecasting across benchmarks that share the same measurement objective.

URL PDF HTML ☆

赞 0 踩 0

2606.07530 2026-06-09 cs.CL 新提交

Finding New Connections between Concepts from Medline Database Incorporating Domain Knowledge

从Medline数据库中结合领域知识发现概念间的新连接

Yang Weikang, Chowdhury S. M. Mazharul Hoque, Jin Wei

AI总结提出一种基于Swanson ABC模型的改进自适应模型，用于文献发现中隐藏的概念连接，通过中间主题B连接看似无关的主题A和C。

详情

DOI: 10.5772/intechopen.113081
Journal ref: Artificial Intelligence, IntechOpen, 2024

AI中文摘要

在这个数字世界中，数据是一切，并显著影响我们的日常生活。有趣的是，在这个小世界里，一切都是生态系统的一部分，万物直接或间接相连。数据也是如此。在大多数情况下，一个特定主题可能看起来与另一个主题没有任何联系，但实际上，它们通过一个相互关联的主题连接在一起。因此，在本研究中，我们将讨论一种自适应模型，该模型由Don R. Swanson的ABC模型（一种基于文献的发现模型）改进而来，用于发现感兴趣概念之间的隐藏联系。该模型表明，两个主题A和C是不同的，并且没有关系。但它们有一个共同的主题B，可以用来连接主题A和C。这个著名的模型将在本讨论中用于连接医学概念。

英文摘要

In this digital world, data is everything and significantly impacts our everyday lives. Interestingly, in this small world, everything is part of an ecosystem, where everything is connected, directly or indirectly. The same thing happens to data as well. In most cases, it may seem like a particular topic does not have any connection with another one, but in reality, they are connected through a mutually related topic. Therefore, in this research, we will discuss an adaptive model modified from the ABC model by Don R. Swanson, a Literature-Based Discovery (LBD) Model, to find the hidden connections between Concepts of Interest. The model demonstrates that two topics, A and C are different and have no relationship. But they have a common topic, B that can be used to connect topics A and C This famous model will be used in this discussion to connect Medical Concepts.

URL PDF HTML ☆

赞 0 踩 0

2605.22208 2026-06-09 cs.CV 版本更新

EvoIR-Agent: Self-Evolving Image Restoration Agentic System via Experience-Driven Learning

EvoIR-Agent: 通过经验驱动学习实现自进化图像修复智能体

Kailin Zhuang, Jiawei Wu, Zhi Jin

AI总结本文提出EvoIR-Agent，通过经验驱动学习解决图像修复中经验不足导致的规划失败问题，通过构建分层经验池和自进化机制提升修复性能和效率，实验表明其在全参考指标上表现优异，且在性能与效率之间取得显著平衡。

Comments Temporarily withdrawn for institutional clearance and compliance review. A revised version will be uploaded once the process is finalized

详情

AI中文摘要

多模态大语言模型（MLLM）驱动的图像修复智能体在退化耦合场景中表现出色，能够灵活选择工具并确定去除顺序。然而，其零样本规划在缺乏经验时往往失效，需要通过大量试错来获得满意结果。目前有两种方法用于解决此问题，但存在矛盾：基于训练的方法将内在经验嵌入参数中，实现高推理效率但缺乏对新工具或退化的兼容性。相比之下，基于免训练的方法利用显式经验存储以提高兼容性，但仍因经验存储方式简单而存在试错开销。为解决此矛盾，本文提出EvoIR-Agent，首先系统地制定了免训练图像修复智能体的经验组件。随后构建了分层经验池，能够为多样化的工具和去除顺序提供粗到细的指导。此外，引入了自进化机制，通过积累的记录更新池，从而大大提高了性能和效率。大量实验表明，EvoIR-Agent在全参考指标上取得了显著领先，并在性能与效率之间实现了显著的帕累托最优平衡。

英文摘要

Multimodal Large Language Model (MLLM)-driven image restoration agent demonstrates effectiveness in degradation coupling scenarios by flexibly selecting tools and determining removal orders. However, their zero-shot planning often fails without experience, necessitating severe trial-and-error overhead to achieve satisfactory outcomes. Currently, two paradigms are employed to address this issue, yet a dilemma persists: Training-based methods embed intrinsic experience into parameters, achieving high inference efficiency but lacking compatibility with new tools or degradation. In contrast, training-free methods utilize explicit experience storage for compatibility but still incur trial-and-error overhead due to naive experience. To resolve the dilemma, we propose EvoIR-Agent, which first systematically formulates the experience components of a training-free image restoration agent. Subsequently, a hierarchical experience pool is constructed, which enables coarse-to-fine guidance for diverse tools and removal orders. Furthermore, a self-evolving mechanism is introduced to update the pool from scratch using accumulated records, thereby greatly improving performance and efficiency. Extensive experiments reveal that EvoIR-Agent achieves a significant lead in the full reference metrics and yields a remarkable Pareto-optimal balance between performance and efficiency compared to the state-of-the-art methods.

URL PDF HTML ☆

赞 0 踩 0

2511.17855 2026-06-09 cs.AI cs.RO 版本更新

QuickLAP: Quick Language-Action Preference Learning for Semi-Autonomous Agents

QuickLAP: 为半自主代理快速语言-动作偏好学习

Jordan Abi Nader, David Lee, Nathaniel Dennler, Andreea Bobu

AI总结本研究提出QuickLAP，一种融合物理和语言反馈的贝叶斯框架，用于实时推断奖励函数，通过大规模语言模型提取奖励特征注意力掩码和偏好偏移，从而在半自主驾驶模拟器中将奖励学习误差降低70%，并通过用户研究验证其可理解性和协作性。

详情

AI中文摘要

机器人必须从人们的行为和语言中学习，但单一模态往往不完整：物理修正具有语境但意图模糊，而语言表达高层目标但缺乏物理基础。我们引入QuickLAP：快速语言-动作偏好学习，一种贝叶斯框架，融合物理和语言反馈以实时推断奖励函数。我们的关键见解是将语言视为用户潜在偏好的概率观测，明确哪些奖励特征重要以及如何解释物理修正。QuickLAP利用大规模语言模型（LLMs）从自由形式陈述中提取奖励特征注意力掩码和偏好偏移，并与物理反馈结合在一个闭式更新规则中。这使得能够快速、实时且鲁棒地学习奖励，处理模糊反馈。在半自主驾驶模拟器中，QuickLAP相比仅物理和启发式多模态基线将奖励学习误差降低超过70%。15名参与者的用户研究进一步验证了我们的方法：参与者发现QuickLAP更易懂和协作，并且更喜欢其学习行为。代码可在https://github.com/MIT-CLEAR-Lab/QuickLAP获取。

英文摘要

Robots must learn from both what people do and what they say, but either modality alone is often incomplete: physical corrections are grounded but ambiguous in intent, while language expresses high-level goals but lacks physical grounding. We introduce QuickLAP: Quick Language-Action Preference learning, a Bayesian framework that fuses physical and language feedback to infer reward functions in real time. Our key insight is to treat language as a probabilistic observation over the user's latent preferences, clarifying which reward features matter and how physical corrections should be interpreted. QuickLAP uses Large Language Models (LLMs) to extract reward feature attention masks and preference shifts from free-form utterances, which it integrates with physical feedback in a closed-form update rule. This enables fast, real-time, and robust reward learning that handles ambiguous feedback. In a semi-autonomous driving simulator, QuickLAP reduces reward learning error by over 70% compared to physical-only and heuristic multimodal baselines. A 15-participant user study further validates our approach: participants found QuickLAP significantly more understandable and collaborative, and preferred its learned behavior over baselines. Code is available at https://github.com/MIT-CLEAR-Lab/QuickLAP.

URL PDF HTML ☆

赞 0 踩 0

2603.09995 2026-06-09 cs.CL cs.AI 版本更新

Context Over Compute Human-in-the-Loop Outperforms Iterative Chain-of-Thought Prompting in Interview Answer Quality

上下文胜过计算人类在环优于迭代思维链提示在面试回答质量上的表现

Kewen Zhu, Zixi Liu, Yanjing Li, Jing Chen

AI总结本文通过对比人类在环和自动思维链提示方法，发现人类在环在面试回答质量评估中表现更优，且迭代次数更少，同时具有更高的训练效果。

详情

AI中文摘要

使用大语言模型进行行为面试评估存在独特挑战，需要结构化评估、现实面试官行为模拟和候选人培训的教育价值。我们通过两个受控实验研究思维链提示在面试回答评估和改进中的应用，使用50对行为面试问题和回答对。我们的贡献有三方面：首先，我们提供了人类在环和自动思维链改进的定量比较。使用配对设计，n等于50，两种方法均显示出积极的评分改进。人类在环方法提供了显著的培训效益。信心从3.16提高到4.16（p小于0.001），真实性从2.94提高到4.53（p小于0.001，Cohen's d是3.21）。人类在环方法还要求五次迭代更少（1.0对5.0，p小于0.001）并实现了完整的个人细节整合。其次，我们分析了收敛行为。两种方法都快速收敛，平均迭代次数低于1次，其中人类在环方法在最初较弱的回答中达到100%的成功率，而自动方法为84%（Cohen's h是0.82，大效应）。额外的迭代提供 diminishing returns，表明主要限制是上下文可用性而非计算资源。第三，我们提出了一种基于负面偏见模型的对抗性挑战机制，称为bar raiser，以模拟现实的面试官行为，尽管定量验证仍需未来工作。我们的发现表明，尽管思维链提示为面试评估提供了有用的基石，但领域特定的增强和上下文感知的方法选择对于现实和具有教育价值的结果至关重要。

英文摘要

Behavioral interview evaluation using large language models presents unique challenges that require structured assessment, realistic interviewer behavior simulation, and pedagogical value for candidate training. We investigate chain of thought prompting for interview answer evaluation and improvement through two controlled experiments with 50 behavioral interview question and answer pairs. Our contributions are threefold. First, we provide a quantitative comparison between human in the loop and automated chain of thought improvement. Using a within subject paired design with n equals 50, both approaches show positive rating improvements. The human in the loop approach provides significant training benefits. Confidence improves from 3.16 to 4.16 (p less than 0.001) and authenticity improves from 2.94 to 4.53 (p less than 0.001, Cohen's d is 3.21). The human in the loop method also requires five times fewer iterations (1.0 versus 5.0, p less than 0.001) and achieves full personal detail integration. Second, we analyze convergence behavior. Both methods converge rapidly with mean iterations below one, with the human in the loop approach achieving a 100 percent success rate compared to 84 percent for automated approaches among initially weak answers (Cohen's h is 0.82, large effect). Additional iterations provide diminishing returns, indicating that the primary limitation is context availability rather than computational resources. Third, we propose an adversarial challenging mechanism based on a negativity bias model, named bar raiser, to simulate realistic interviewer behavior, although quantitative validation remains future work. Our findings demonstrate that while chain of thought prompting provides a useful foundation for interview evaluation, domain specific enhancements and context aware approach selection are essential for realistic and pedagogically valuable results.

URL PDF HTML ☆

赞 0 踩 0

2507.12843 2026-06-09 cs.LG stat.ML 版本更新

Are Two Datasets Close Enough With Statistical Significance? A Kernel Distributional Closeness Testing Approach

两个数据集在统计意义上是否足够接近？一种核分布接近性检验方法

Zhijian Zhou, Liuhua Peng, Xunye Tian, Mingming Gong, Feng Liu

AI总结针对分布接近性检验（DCT）在复杂数据上的局限性，提出基于核的最大均值差异（MMD）的改进度量NAMMD，并构建NAMMD-DCT方法，在保持I类错误有界的同时提高检验功效。

详情

AI中文摘要

两个分布在统计意义上是否接近？分布接近性检验（DCT）通过检验分布对之间的距离是否至少为epsilon来形式化这一问题。现有的DCT方法主要测量定义在离散空间上的分布对之间的差异，例如使用总变差，这限制了它们在图像等复杂数据上的应用。为了将DCT扩展到更多类型的数据，一个自然的想法是将最大均值差异（MMD）引入DCT场景，MMD是衡量复杂分布之间分布差异的强大度量。然而，实证结果表明，许多分布对可能具有相同的MMD值，尽管它们在同一个再生核希尔伯特空间（RKHS）中具有不同的范数。这些分布对可能表现出不同的有限样本可区分性，并反映不同的实际接近程度，使得MMD在DCT中信息量不足。为了缓解这个问题，我们设计了一种新的分布差异度量——范数自适应MMD（NAMMD），它使用分布的RKHS范数来缩放MMD值。基于NAMMD的渐近分布，我们提出了基于NAMMD的DCT来评估分布对的接近程度。理论上，我们证明了基于NAMMD的DCT比基于MMD的DCT具有更高的检验功效，同时保持有界的I类错误。这一点在多种类型的数据（包括合成噪声和真实图像）上的大量实验中得到进一步验证。我们的代码可在此https URL获取。

英文摘要

Are two distributions close to each other with statistical significance? Distribution closeness testing (DCT) formalizes this question by testing whether the distance between a distribution pair is at least epsilon-far. Existing DCT methods mainly measure discrepancies between distribution pairs defined on discrete spaces, for example using total variation, which limits their application to complex data such as images. To extend DCT to more types of data, a natural idea is to introduce maximum mean discrepancy (MMD), a powerful measure of distributional discrepancy between complex distributions, into DCT scenarios. However, empirical results indicate that many distribution pairs can have the same MMD value despite having different norms in the same reproducing kernel Hilbert space (RKHS). These pairs may exhibit different finite-sample distinguishability and reflect different practical closeness levels, making MMD less informative for DCT. To mitigate this issue, we design a new measure of distributional discrepancy, norm-adaptive MMD (NAMMD), which scales the MMD value using the RKHS norms of distributions. Based on the asymptotic distribution of NAMMD, we propose NAMMD-based DCT to assess the closeness level of a distribution pair. Theoretically, we prove that NAMMD-based DCT has higher test power than MMD-based DCT while maintaining bounded type-I error. This is further validated by extensive experiments on multiple types of data, including synthetic noise and real images. Our code is available at https://github.com/zhijianzhouml/NAMMD.

URL PDF HTML ☆

赞 0 踩 0

2606.08580 2026-06-09 eess.AS cs.SD 新提交

G-MaP-SE: Guided Speech Enhancement via GMM-Based Prior Matching

G-MaP-SE: 基于GMM先验匹配的引导式语音增强

Yike Zhu, Ziqian Wang, Zikai Liu, Xingchen Li, Zhuangqi Chen, Xianjun Xia, Chuanzeng Huang, Lei Xie

AI总结提出G-MaP-SE框架，利用高斯混合模型构建干净语音嵌入先验，通过匹配噪声条件嵌入来提升语音增强性能，无需注册音频即可接近理想干净条件上限。

Comments Accepted to Interspeech 2026

2606.07834 2026-06-09 cs.SE cs.AI cs.CL cs.MA 新提交

Cherry-pick Override: Unsafe Directional Commitment in LLM Judges under Mixed Evidence

Cherry-pick Override：混合证据下LLM法官的不安全方向性承诺

Haoran Xu

AI总结针对混合证据场景，发现LLM法官会错误地返回方向性裁决（SUPPORTS/REFUTES）而非授权非方向性裁决（CONFLICTING），定义为Cherry-pick Override（CCO）；通过诊断协议和干预实验，提出外部承诺控制层分离裁决生成与授权。

Comments 12 pages, 1 figure

详情

AI中文摘要

LLM法官越来越多地将裁决转化为系统承诺。在混合证据（同时包含支持和反驳来源的声明）下，这是不安全的：当模式将CONFLICTING作为授权的非方向性裁决暴露时，返回SUPPORTS/REFUTES是一种未经授权的方向性承诺，我们将这种失败命名为Cherry-pick Override（CCO）。我们在明确的任务契约下定义CCO，并使用同分母诊断协议、匹配覆盖率的bootstrap以及苹果对苹果的随机否决零假设进行报告。在AVeriTeC的Conflicting子集（N_C = 150）上，三选项法官对超过84%的混合证据声明返回方向性裁决；在类型化模式下，三法官多数投票在AVeriTeC上放大了冲突上的方向性（0.887 vs. 0.840；95% CI [+0.013, +0.080]），但在VitaminC-Mixed上未复制。通过常见的单通道修复（类型化词汇、面板聚合、置信度阈值、仅验证器过滤）的干预阶梯，每个都留下了不同的残余失败：面板聚合在48%的CCO案例中抑制了单个法官的CONFLICTING异议；面板对方向校准良好（纯S/R上的ECE = 0.07），因此置信度无法在操作上区分CCO与正确的方向性承诺；验证器作为分类器几乎将纯证据准确率减半。一个最小双通道参考探针达到了任一单通道无法达到的操作点；在随机否决零假设下，其对CONFLICTING的提升在AVeriTeC上具有结构性针对性（经验p < 1/2001），在VitaminC-Mixed上方向相同但较弱，这是一个选择性结果而非幅度结果。我们主张一个外部承诺控制层，将裁决生成与承诺授权分离，使用结构证据和置信度作为正交通道，并将NO-COMMIT作为路由控制器状态。

英文摘要

LLM judges increasingly turn verdicts into system commitments. Under mixed evidence (claims with both supporting and refuting sources) this is unsafe: when the schema exposes CONFLICTING as the authorized non-directional verdict, returning SUPPORTS/REFUTES is an unauthorized directional commitment, a failure we name Cherry-pick Override (CCO). We define CCO under an explicit task contract and report it with a same-denominator diagnostic protocol paired with matched-coverage bootstrap and an apples-to-apples random-veto null. On AVeriTeC's Conflicting subset (N_C = 150), three-option judges return a directional verdict on more than 84% of mixed-evidence claims; under the typed schema, three-judge majority voting amplifies direction-on-conflict on AVeriTeC (0.887 vs. 0.840; 95% CI [+0.013, +0.080]) but does not replicate on VitaminC-Mixed. Walking an intervention ladder of common single-channel fixes (typed vocabulary, panel aggregation, confidence thresholding, validator-only filtering), each leaves a distinct residual failure: panel aggregation suppresses single-judge CONFLICTING dissent in 48% of CCO cases; the panel is well-calibrated for direction (ECE = 0.07 on pure-S/R) so confidence cannot operationally separate CCO from correct directional commits; validator-as-classifier nearly halves pure-evidence accuracy. A minimal two-channel reference probe reaches operating points neither single channel reaches; under the random-veto null its promotion to CONFLICTING is structurally targeted on AVeriTeC (empirical p < 1/2001) and weaker but in the same direction on VitaminC-Mixed, a selectivity result rather than a magnitude one. We argue for an external commitment-control layer that separates verdict generation from commitment authorization, using structural evidence and confidence as orthogonal channels and NO-COMMIT as a routed controller state.

URL PDF HTML ☆

赞 0 踩 0

2606.07693 2026-06-09 stat.ML cs.LG math.PR 新提交

Transfer learning for causal forest

迁移学习用于因果森林

Bérénice-Alexia Jocteur, Véronique Maume-Deschamps, Pierre Ribereau

AI总结提出一种针对因果森林HTERF的迁移学习方法，通过偏移量估计源域与目标域之间的模型偏移，并给出目标域上CATE误差的上界，仿真和真实数据验证了有效性。

详情

AI中文摘要

迁移学习解决了从一个领域向另一个领域迁移知识的挑战。传统的迁移学习侧重于调整在源域（有大量观测）上训练的模型，以提高在目标域（观测较少）上的性能。在这项工作中，我们考虑模型偏移的情况，并专注于将迁移学习应用于因果森林，即HTERF。该因果森林旨在估计条件平均处理效应（CATE）。所考虑的方法是Wang（2016）提出的偏移量方法，经过调整以适应因果背景。该方法依赖于使用中间模型来估计源分布和目标分布之间的偏移量。我们的主要结果是基于中间模型的误差，给出了目标上HTERF的CATE误差的上界。仿真研究表明，该方法在不同设置下的仿真以及真实数据集上均表现出良好的性能。

英文摘要

Transfer learning addresses the challenge of transfering knowledge from one domain to another. Traditional transfer learning focuses on adapting models trained on a source domain (with a lot of observations) to improve performance on a target domain (with few observations). In this work we consider the case of a model shift and we focus on the transfer learning applied to a causal forest namely HTERF. This causal forest aims to estimate the Conditional Average Treatment Effect (CATE). The approach considered is the offset method presented by Wang (2016) adapted to a causal context. This method relies on the use of intermediate models in order to estimate the offset between source and target distributions. Our main result is a bound on the CATE error of HTERF on target depending on the error of the intermediate models. Simulation studies show the good performances of this approach in different settings on simulations and on a real-world dataset.

URL PDF HTML ☆

赞 0 踩 0

2606.07575 2026-06-09 q-fin.RM cs.LG 新提交

Forward-Looking Stress Testing Under Macro Scenarios: Stable SVaR Estimation Using a Hybrid GPR-HS Framework with SACS

宏观情景下的前瞻性压力测试：基于混合GPR-HS框架与SACS的稳定SVaR估计

Ujjwala Vadrevu

AI总结本文扩展混合高斯过程回归历史模拟框架至前瞻性压力情景，提出情景平均协方差稳定方法，在三种宏观情景下实现稳定的压力在险价值估计，满足监管要求。

Comments 15 pages, 3 figures. Extension of a hybrid GPR-HS framework to forward-looking stress testing with scenario-based SVaR and covariance stabilization (SACS)

详情

AI中文摘要

监管压力测试框架，包括全面资本分析与审查（CCAR）和内部资本充足评估程序（ICAAP），要求在前瞻性宏观经济情景下进行稳健的压力在险价值（SVaR）估计。传统的参数化方法在极端冲击下常表现出数值不稳定性，降低了资本预测的可靠性。\n本文将Vadrevu（2026）的混合高斯过程回归历史模拟（GPR-HS）框架扩展到前瞻性压力情景，展示了在三种情景（西亚战争、气候风险和AI泡沫/监管）下的稳定性。\n一个关键贡献是情景平均协方差稳定（SACS）框架，它将压力协方差构建为历史危机情景的加权聚合，提供稳定且可解释的依赖结构。压力收益路径通过确定性漂移和随机残差在252天的时间跨度内生成，而波动率通过具有激进噪声初始化（ANI）的高斯过程回归建模。\n该框架在所有资产和情景下表现出一致的收敛性。SVaR范围从-2.1020%到-2.2231%，并且保持了|SES| > |SVaR|的一致性属性。结果支持GPR-HS与SACS作为CCAR和ICAAP应用中前瞻性SVaR和SES估计的稳定且符合监管要求的方法。

英文摘要

Regulatory stress testing frameworks, including the Comprehensive Capital Analysis and Review (CCAR) and the Internal Capital Adequacy Assessment Process (ICAAP), require robust Stressed Value-at-Risk (SVaR) estimation under forward-looking macroeconomic scenarios. Traditional parametric approaches often exhibit numerical instability under extreme shocks, reducing the reliability of capital projections. This paper extends the Hybrid Gaussian Process Regression Historical Simulation (GPR-HS) framework of Vadrevu (2026) to forward-looking stress scenarios, demonstrating stability across three regimes: West Asia War, Climate Risk, and AI Bubble/Regulation. A key contribution is the Scenario-Averaged Covariance Stabilization (SACS) framework, which constructs stress covariance as a weighted aggregation of historical crisis regimes, providing stable and interpretable dependence structures. Stressed return paths are generated over a 252-day horizon using deterministic drift and stochastic residuals, while volatility is modeled via Gaussian Process Regression with Aggressive Noise Initialization (ANI). The framework exhibits consistent convergence across all assets and scenarios. SVaR ranges from -2.1020% to -2.2231%, with the coherence property |SES| > |SVaR| preserved. The results support GPR-HS with SACS as a stable and regulator-aligned approach for forward-looking SVaR and SES estimation in CCAR and ICAAP applications.

URL PDF HTML ☆

赞 0 踩 0

2606.07536 2026-06-09 cs.CY cs.AI 新提交

Beware of GeeksBearing Gifts: Building True EU Frontier AI Sovereignty

警惕带来礼物的极客：构建真正的欧盟前沿人工智能主权

Nick Moës, Toni Lorente, Amin Oueslati, Jonathan Smith, Robin Staes-Polet, Radina Kraeva

AI总结本文提出一个涵盖经济竞争力、韧性、安全与国防、欧洲价值观和对外关系五大主权支柱，以及五层26组件29子组件的前沿AI堆栈分解框架，用于识别欧盟政策中的关键缺口、冗余和权衡，以支持战略自主。

详情

AI中文摘要

前沿人工智能正在重塑社会的方方面面，从经济产出或军事能力到民主制度。欧盟正从一个结构性依赖的位置进入这一转型：前沿模型几乎全部来自美国或中国，美国拥有约欧盟16倍的人工智能超级计算能力，全球超大规模数据中心容量中仅有15%位于欧盟境内。尽管欧盟委员会已加速其政策响应，现有举措仍然分散，缺乏确保整个前沿人工智能价值链战略自主的统一愿景。在此，我们提出了一个统一框架，将五大主权支柱（经济竞争力、韧性、安全与国防、欧洲价值观和对外关系）与前沿人工智能堆栈的分解联系起来，该堆栈包括五层、26个组件和29个子组件。该框架能够识别当前欧盟政策中隐含的关键差距、冗余和跨支柱权衡。我们对人工智能千兆工厂倡议的分析表明，以主权为中心的视角如何揭示狭隘经济框架所掩盖的冲突。此外，该框架为政策制定者提供了结构化基础，用于设计、评估和优先考虑跨欧洲战略自主多个维度的前沿人工智能干预措施，涵盖我们识别的四大委员会通讯中的92项倡议及其他。

英文摘要

Frontier artificial intelligence is reshaping all aspects of society, from economic output or military capability to democratic institutions. The EU is entering this transformation from a position of structural dependence: frontier models originate almost exclusively from the United States or China, the US holds approximately sixteen times the EU's AI supercomputing capacity, and only 15% of global hyperscale data centre capacity resides within EU borders. Although the European Commission has accelerated its policy response, existing initiatives remain fragmented and lack a cohesive vision for securing strategic autonomy across the full frontier AI value chain. Here we propose a unified framework connecting five sovereignty pillars (economic competitiveness, resilience, security and defence, European values, and foreign relations) to a decomposition of the frontier AI stack comprising five layers, 26 components, and 29 sub-components. This framework allows the identification of critical gaps, redundancies, and inter-pillar trade-offs that current EU policy leaves implicit. Our analysis of the AI Gigafactory Initiative illustrates how a sovereignty-centred lens reveals conflicts that narrowly economic framings obscure. Moreover, this framework offers policymakers a structured basis for designing, evaluating, and prioritising frontier AI interventions across multiple dimensions of European strategic autonomy across the 92 initiatives from four major Commission communications we. identify, and beyond.

URL PDF HTML ☆

赞 0 踩 0

2606.09829 2026-06-09 eess.SP 新提交

Adaptive Derivative Estimation via Stein's Unbiased Risk

基于Stein无偏风险的自适应导数估计

Yonathan Murin, Ali Ozer Ercan

AI总结提出SURDE方法，通过Stein无偏风险评估候选滤波器长度并软组合输出，实现因果FIR导数滤波的噪声-偏差权衡，证明极小极大最优性，在仿真和真实数据上优于ICI和AWVE。

Comments Submitted to IEEE Transactions on Signal Processing, 23 pages

详情

AI中文摘要

从含噪采样数据中估计导数对于控制、人机交互和生物医学工程至关重要。因果FIR导数滤波器为此提供了一种自然方法，但其性能取决于滤波器长度。短滤波器放大噪声，长滤波器引入平滑偏差。我们提出SURDE（SURE导数估计器），通过在一组候选长度上评估基于Stein无偏风险估计（SURE）的数据驱动代价，并利用指数加权软组合它们的输出，在每个时间步解决这一权衡。我们证明了软组合估计器的极小极大最优预言不等式，并据此推导出最优加权温度的闭式解。因此，SURDE唯一的调参参数是噪声方差。通过数值模拟，我们展示了SURDE在一阶导数估计中始终优于替代自适应方法（置信区间交集（ICI）规则和自适应窗口速度估计器（AWVE））。我们进一步表明SURDE对噪声方差误设具有鲁棒性（在4倍范围内性能下降9%），并且在真实数据场景（EuRoC MAV数据集）中也优于ICI和AWVE。SURDE是因果的、计算轻量，且仅需噪声方差的粗略估计。

英文摘要

Estimating derivatives from noisy sampled data is fundamental to control, human--computer interaction, and biomedical engineering. Causal FIR derivative filters offer a natural approach for this challenge, yet their performance depend on their length. While short filters amplify noise, long filters introduce smoothing bias. We present SURDE (SURE Derivative Estimator), which addresses this tradeoff at each time step by evaluating a data-driven cost derived from Stein's Unbiased Risk Estimator (SURE) across a bank of candidate lengths and soft-combining their outputs via exponential weighting. We prove a minimax-optimal oracle inequality for the soft-combined estimator and use it to derive the optimal weighting temperature in closed form. Thus, the only tuning parameter for SURDE is the noise variance. Via numerical simulations we show that SURDE consistently outperforms alternative adaptive methods (the Intersection of Confidence Intervals (ICI) rule and the Adaptive Windowing Velocity Estimator (AWVE)) for first-derivative estimation. We further show that \surede{} is robust to noise-variance misspecification (9\% degradation over a $4\times$ range), and that it is superior to ICI and AWVE also over real data scenarios (the EuRoC MAV dataset). SURDE is causal, computationally light, and requires only a rough estimate of the noise variance.

URL PDF HTML ☆

赞 0 踩 0

2606.09823 2026-06-09 cond-mat.str-el 新提交

Topological Triplons in the Pinwheel Valence Bond Solid on the Kagome Lattice

Kagome晶格上风车价键固体的拓扑三线态激发

Laura Calonge-Martínez, Peng Rao, Frédéric Mila, Johannes Knolle

AI总结研究变形Kagome晶格化合物Rb2Cu3SnF12中风车价键固体的三线态激发，利用键算符平均场理论计算能带、动力学结构因子、贝里曲率和热霍尔效应，发现Dzyaloshinskii-Moriya相互作用和外磁场赋予三线态能带非平庸陈数。

2606.09819 2026-06-09 gr-qc astro-ph.CO 新提交

Linear Ricci-Trace Deformations and Operational Equivalence in Rastall-Type Gravity

线性Ricci迹变形与Rastall型引力中的操作等价性

José A. C. Nogales, Karen-Luz Burgoa Rosso, Marcelo H. Alavarenga

AI总结分析爱因斯坦场方程的线性Ricci迹变形，分类场方程并校准参数，证明两种常用参数化仅在同时变换变形参数和引力耦合时代数同构，但操作不等价，并区分于单模引力。

Comments 23 pages, 0 figure

详情

AI中文摘要

我们分析了一类爱因斯坦场方程的线性Ricci迹变形，其中Ricci张量与标量曲率迹部分之间的相对权重被修改，而度规仍然是唯一的引力场。分析的目的在于结构而非现象学：我们对相应的场方程类进行分类，固定Rastall引力文献中常用的参数字典，并确定牛顿校准后哪些等价陈述仍然成立。我们证明两种常用的参数化，\\[ (1-ε)R_{μν}-\frac12 g_{μν}R=κ_εT_{μν}, \qquad R_{μν}-\frac{1-λ}{2}g_{μν}R=κ_λT_{μν}, \\] 仅当变形参数和裸引力耦合同时变换时才是代数同构的。然而，这种代数等价并非自动的操作等价。一旦固定相同的实验室应力张量和相同的测量牛顿常数，参数映射仅在爱因斯坦点处是被动重参数化。我们进一步将λ代表式识别为标准Rastall方程，阐明守恒有效源的作用，推导相应的FLRW理想流体扇区，并讨论退化情况，包括真空、无迹物质、辐射、尘埃和奇异无迹点。最后，我们将Ricci迹类与单模引力区分开：尽管两者都涉及迹扇区，但单模引力源于受限变分原理，并将宇宙常数作为积分常数产生，而非来自代数Ricci迹变形。结果是对Rastall型Ricci迹模型的紧凑操作分类。

英文摘要

We analyze a class of linear Ricci--trace deformations of Einstein's field equations in which the relative weight between the Ricci tensor and the scalar-curvature trace sector is modified while the metric remains the only gravitational field. The purpose of the analysis is structural rather than phenomenological: we classify the corresponding field-equation class, fix the parameter dictionaries commonly used in the Rastall-gravity literature, and identify which equivalence statements survive after Newtonian calibration. We show that two frequently used parametrizations, \[ (1-ε)R_{μν}-\frac12 g_{μν}R=κ_εT_{μν}, \qquad R_{μν}-\frac{1-λ}{2}g_{μν}R=κ_λT_{μν}, \] are algebraically isomorphic only if both the deformation parameter and the bare gravitational coupling are transformed simultaneously. This algebraic equivalence, however, is not automatically an operational equivalence. Once the same laboratory stress tensor and the same measured Newton constant are fixed, the parameter map is a passive reparametrization only at the Einstein point. We further identify the $λ$-representative with the standard Rastall equation, clarify the role of the conserved effective source, derive the corresponding FLRW perfect-fluid sector, and discuss degenerate cases including vacuum, trace-free matter, radiation, dust, and the singular traceless point. Finally, we distinguish the Ricci--trace class from Unimodular Gravity (UG): although both involve the trace sector, UG follows from a restricted variational principle and produces the cosmological constant as an integration constant, rather than from an algebraic Ricci--trace deformation. The result is a compact operational classification of Rastall-type Ricci--trace models.

URL PDF HTML ☆

赞 0 踩 0

2606.09818 2026-06-09 math.NT 新提交

A note on large values of Dirichlet $L$-functions for characters of fixed order at $1/2<σ\leq 1$

关于固定阶特征在 $1/2<σ\leq 1$ 处 Dirichlet $L$-函数大值的一个注记

Youness Lamzouri

AI总结本文通过简单论证，证明了固定阶本原特征的 Dirichlet $L$-函数在 $σ\in (1/2,1]$ 处存在推测性尖锐大小的大值，并给出了显式常数。

Comments 7 pages

2606.09815 2026-06-09 math.OC math.PR 新提交

Limit Theory for $N$-Player $α$-Potential Games

$N$ 玩家 $\alpha$-势博弈的极限理论

Xin Guo, Meng Wang, Yufei Zhang

AI总结研究 $N$ 玩家 $\alpha$-势博弈当 $N\to\infty$ 时的极限行为，证明其收敛到势平均场博弈，并建立 $\lim_{N\to\infty}\alpha_N=0$ 与势平均场博弈存在条件的等价性，同时利用 Wasserstein 空间微分几何构造势函数。

详情

AI中文摘要

$\alpha$-势博弈框架最近被引入作为分析有限玩家动态博弈的工具，将寻找近似纳什均衡的挑战性任务简化为最小化单个函数（称为 $\alpha$-势函数）的控制问题。本文研究了当玩家数量 $N$ 趋于无穷时 $\alpha$-势博弈的极限行为。我们证明势平均场博弈（MFGs）自然地作为这一极限出现。具体地，归一化的 $N$ 玩家 $\alpha_N$-势函数的最优值和极小化子都收敛到具有测度值控制的平均场控制（MFC）问题的最优值和极小化子。我们建立了 $\lim_{N\to\infty}\alpha_N= 0$ 与势 MFGs 现有条件的等价性，并提供了利用 Wasserstein 空间微分几何技术构造 MFGs 势函数的统一方法。我们进一步证明极限 MFC 问题的目标函数作为相应 MFGs 的势函数，这是有限玩家情形的推广。这一联系通过渐近条件 $\lim_{N\to \infty}\alpha_N= 0$ 从有限玩家博弈得到了势 MFGs 的新构造。作为副产品，我们建立了对于具有共同噪声和非可分离控制相互作用的受控扩散，收敛到 MFGs 的 $N$ 玩家博弈的混沌传播。

英文摘要

The framework of $α$-potential games has recently been introduced as a tool to analyze finite-player dynamic games, reducing the challenging task of finding approximate Nash equilibria to a control problem of minimizing a single function called $α$-potential function. In this work, we investigate the limiting behavior of $α$-potential games as the number of players $N$ tends to infinity. We show that potential mean field games (MFGs) arise naturally as this limit. Specifically, both the optimal values and the minimizers of normalized $N$-player $α_N$-potential functions converge to those of a mean field control (MFC) problem with measure-valued controls. We establish the equivalence of $\lim_{N\to\infty}α_N= 0$ with the existing conditions for potential MFGs, and provide an unified approach to construct the potential function for MFGs using the techniques from differential geometry in Wasserstein space. We further demonstrate that the objective of the limiting MFC problem serves as a potential function for the corresponding MFGs, an extension of the analogous finite-player setting. This connection yields new constructions of potential MFGs from a finite-player game, through the asymptotic condition $\lim_{N\to \infty}α_N= 0$. As a by-product, we establish propagation of chaos for $N$-player games converging to MFGs for general controlled diffusions with common noise and non-separable control interactions.

URL PDF HTML ☆

赞 0 踩 0

2606.09812 2026-06-09 cond-mat.mes-hall cond-mat.str-el 新提交

Persistent currents, whirlpools, and local Chern markers in twisted TMD Chern insulators

扭转TMD Chern绝缘体中的持续电流、漩涡和局域Chern标记

Francesco Cioni, Lorenzo Cavicchi, Nazzareno Africani, Giacomo Mazza, Fabio Taddei, Amir Yacoby, Marco Polini

AI总结本文研究扭转过渡金属二硫族化物同质双层中的持续电流和漩涡，提出电流密度幅值可作为拓扑序的精确追踪器，并分析有限尺寸效应对霍尔电导量子化的影响。

Comments 7 pages, 4 figures + Supplemental Material

2606.09810 2026-06-09 astro-ph.CO astro-ph.GA gr-qc 新提交

Inflationary interpretation of the gravitational-wave signal in the European Pulsar Timing Array DR2 with constraints

欧洲脉冲星计时阵列DR2引力波信号的暴胀解释及约束

Philippe Turgeon, Chiara Caprini, Anton Chudaykin, Martin Kunz, Delphine Perrodin, Ismael Cognard, Lucas Guillemot, Gilles Theureau

AI总结本文用暴胀模型解释EPTA DR2引力波背景信号，通过参数化张量功率谱并综合CMB、BBN和LIGO-Virgo-KAGRA约束，得到参数空间范围，发现信号可能源于辐射主导时期再入哈勃半径的张量模式，但需要极低的重加热温度。

详情

AI中文摘要

欧洲脉冲星计时阵列（EPTA）合作的第二次数据发布提供了引力波（GW）背景存在的证据。在这项工作中，我们探索了该信号在暴胀情景下的潜在宇宙学解释。我们将张量功率谱参数化为张量标量比 $r$、张量谱指数 $n_t$、重加热温度 $T_{\text{rh}}$ 和截止频率 $f_{\text{end}}$。我们纳入了所有相关的观测约束，包括来自宇宙微波背景、大爆炸核合成和LIGO-Virgo-KAGRA观测的约束。我们证明，一致地施加这些约束将提供EPTA信号可行解释的参数空间区域缩小到：在95%置信水平下，$-11.66 \lesssim \log_{10}r \lesssim -1.45$，$1.32 \lesssim n_t \lesssim 2.47$，$1.78\text{ MeV} \lesssim T_{\text{rh}} \lesssim 28.2\text{ GeV}$，以及$75.86\text{ nHz} \lesssim f_{\text{end}} \lesssim 14.45\text{ Hz}$。这有利于EPTA频段内的GW谱起源于辐射主导时期重新进入哈勃半径的张量模式的场景，允许更高的$r$和更平坦的谱。然而，$T_{\text{rh}}$必须取非常低的值，这在理论上难以解释。

英文摘要

The second data release of the European Pulsar Timing Array (EPTA) collaboration provides evidence for the presence of a gravitational-wave (GW) background. In this work, we explore a potential cosmological interpretation of this signal in terms of inflationary scenarios. We parametrize the tensor power spectrum in terms of the tensor-to-scalar ratio $r$, the tensor spectral index $n_t$, the reheating temperature $T_{\text{rh}}$, and the cut-off frequency $f_{\text{end}}$. We incorporate all relevant observational constraints, including those from the Cosmic Microwave Background, Big Bang Nucleosynthesis, and LIGO-Virgo-KAGRA observations. We demonstrate that imposing these constraints consistently reduces the region of parameter space that provides a viable interpretation of the EPTA signal, to $-11.66 \lesssim \log_{10}r \lesssim -1.45$, $1.32 \lesssim n_t \lesssim 2.47$, $1.78\text{ MeV} \lesssim T_{\text{rh}} \lesssim 28.2\text{ GeV}$, and $75.86\text{ nHz} \lesssim f_{\text{end}} \lesssim 14.45\text{ Hz}$ at the 95% confidence level. This favours the scenario in which the GW spectrum in the EPTA frequency band originates from tensor modes that re-entered the Hubble radius during the radiation-dominated era, allowing for a higher $r$ and a flatter spectrum. However, $T_{\text{rh}}$ must take very low values, which are challenging to explain theoretically.

URL PDF HTML ☆

赞 0 踩 0

2606.09808 2026-06-09 math.AP 新提交

High Mach number limit of the compressible Navier--Stokes equations in critical Besov spaces

临界Besov空间中可压缩Navier-Stokes方程的高马赫数极限

Jinkai Ni, Zhipeng Zhang

AI总结研究临界Besov框架下可压缩Navier-Stokes系统的高马赫数极限，通过参数依赖的低阶估计证明小初始数据下系统的整体适定性，并恢复无压Navier-Stokes系统的全局强解，对d≥3导出定量误差估计。

Comments 30 pages

详情

AI中文摘要

我们在临界Besov框架下研究缩放的可压缩Navier-Stokes系统的高马赫数极限。在缩放的动量方程中，压力力由项$\varepsilon^2\nabla a^\varepsilon$表示，其中$\varepsilon$是马赫数的倒数；当$\varepsilon\to0$时，形式极限系统是可压缩无压Navier-Stokes系统。由于极限模型中缺乏密度耗散以及粘性项产生的高阶耦合，分析变得复杂。对于$d\geq2$，我们证明了小初始数据下缩放系统的整体适定性，并获得了关于$\varepsilon$一致的估计。一个关键因素是参数依赖的低阶估计$\varepsilon a^\varepsilon$，它补偿了密度方程的纯输运性质，使得一致界得以闭合。基于这些估计，我们证明了高马赫数极限，并恢复了无压Navier-Stokes系统的全局强解。对于$d\geq3$，我们进一步推导了缩放解与无压极限解之间的定量误差估计。更精确地，在每个固定的有限时间区间上，如果初始差异为$\mathcal{O}(\varepsilon)$量级，那么相应的低阶临界Besov误差具有相同的阶，从而给出了无压极限的定量证明。

英文摘要

We investigate the high Mach number limit for the scaled compressible Navier--Stokes system in the critical Besov framework. In the scaled momentum equation, the pressure force is represented by the term $\varepsilon^2\nabla a^\varepsilon$, where $\varepsilon$ is the inverse Mach number; as $\varepsilon\to0$, the formal limiting system is the compressible pressureless Navier--Stokes system. The analysis is complicated by the absence of density dissipation in the limiting model and by the highest-order coupling created by the viscous terms. For $d\geq2$, we prove the global well-posedness of the scaled system for small initial data and obtain estimates that are uniform with respect to $\varepsilon$. A crucial ingredient is a parameter-dependent lower-order estimate for $\varepsilon a^\varepsilon$, which compensates for the purely transport nature of the density equation and allows the uniform bounds to be closed. Based on these estimates, we justify the high Mach number limit and recover a global strong solution to the pressureless Navier--Stokes system. For $d\geq3$, we further derive quantitative error estimates between the scaled solutions and the pressureless limiting solution. More precisely, on each fixed finite time interval, if the initial discrepancy is of order $\mathcal{O}(\varepsilon)$, then the corresponding lower-order critical Besov error satisfies the same rate, which yields a quantitative justification of the pressureless limit.

URL PDF HTML ☆

赞 0 踩 0

2606.09807 2026-06-09 math.CV math.CA 新提交

Pathological function spaces and an unsolved Analysis I problem

病态函数空间与一个未解决的数学分析 I 问题

Thomas Ransford

AI总结本文研究单位圆盘上的一类病态全纯函数空间，其中存在函数虽可用多项式逼近，但无法用其泰勒级数部分和逼近，并由此引出一个仍未解决的级数收敛问题。

Comments 18 pages