arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2606.19212 2026-06-18 stat.ML cs.LG 新提交

Generalised Eigenvalue Geometry of Semantic Adversarial Attacks

语义对抗攻击的广义特征值几何

Martin Anthony, Kaveh Salehzadeh Nobari

AI总结提出一种连续局部模型，通过矩阵束$(A,B)$的最大广义特征值量化语义对抗攻击性，并给出预测翻转条件、攻击性证书及VC界。

详情

AI中文摘要

最近的实证工作表明，语义等价的释义可以欺骗金融情感分类器：尽管释义在强参考嵌入下保持与原文接近，但它可能足以改变目标模型的表示，从而改变预测类别。现有的鲁棒性理论要么假设单模型威胁模型，要么主要关注实证攻击算法。我们开发了一个连续局部模型来描述语义释义扰动，该模型捕捉了这种双模型结构。我们证明，在代理模型预算下，目标表示的最坏情况局部位移由从两个嵌入映射的雅可比矩阵构造的矩阵束$(A,B)$的最大广义特征值控制。由此产生的攻击性指标$\lambda^*(x)$是局部释义几何和所选嵌入器固有的，为仿射读出提供了闭式预测翻转条件，并支持保守的总体和有限样本攻击性证书。为了对仿射读出的类别进行统一控制，我们推导了二元攻击性指标的无分布VC界，以及基于攻击性调整边界的尺度敏感边界，该边界从标准分类器边界中减去局部几何惩罚。我们还将连续理论与离散释义搜索联系起来，识别出成功与不成功的有限搜索之间的不对称性，并给出了离散和连续设置一致时的覆盖条件。最后，我们提出了一个使用软令牌松弛和生成的释义集的实证验证框架，以评估部署的金融文本分类器上的局部特征值几何、预测翻转条件和有限搜索近似。

英文摘要

Recent empirical work shows that semantically equivalent paraphrases can fool financial sentiment classifiers: although a paraphrase remains close to the original under a strong reference embedding, it may shift the target model's representation enough to change the predicted class. Existing robustness theory either assumes a single-model threat model or focuses mainly on empirical attack algorithms. We develop a continuous local model of semantic paraphrase perturbations that captures this two-model structure. We show that the worst-case local displacement of the target representation, subject to a proxy-model budget, is governed by the largest generalised eigenvalue of a matrix pencil $(A,B)$ constructed from the Jacobians of the two embedding maps. The resulting attackability index $λ^*(x)$ is intrinsic to the local paraphrase geometry and the chosen embedders, yields a closed-form prediction-flip condition for affine readouts, and supports conservative population and finite-sample attackability certificates. For uniform control over classes of affine readouts, we derive a distribution-free VC bound for binary attackability indicators and a scale-sensitive margin bound based on an attackability-adjusted margin that subtracts a local geometric penalty from the standard classifier margin. We also connect the continuous theory to discrete paraphrase search, identify an asymmetry between successful and unsuccessful finite searches, and give a covering condition under which the discrete and continuous settings agree. Finally, we propose an empirical verification framework using soft-token relaxations and generated paraphrase sets to assess the local eigenvalue geometry, prediction-flip condition, and finite-search approximation on a deployed financial-text classifier.

URL PDF HTML ☆

赞 0 踩 0

2606.19157 2026-06-18 eess.AS cs.CL 新提交

IndicContextEval: A Benchmark for Evaluating Context Utilisation in Audio Large Language Models Across 8 Indic Languages

IndicContextEval：评估8种印度语言音频大语言模型上下文利用能力的基准

Sakshi Joshi, Dhruv Subhash Rathi, Sanskar Singh, Eldho Ittan George, R J Hari, Kaushal Bhogale, Mitesh M. Khapra

AI总结提出IndicContextEval基准，包含8种印度语言555位说话人的56小时自然语音，通过7级提示框架评估音频大语言模型是否真正利用上下文而非依赖参数化知识。

Comments Accepted at Interspeech 2026

详情

AI中文摘要

音频大语言模型（AudioLLMs）能够基于文本提示（如领域描述或实体列表）进行语音识别。然而，尚不清楚这些模型是真正利用此类上下文，还是依赖预训练期间学到的参数化知识。现有基准无法回答这个问题，因为它们仅在固定提示条件下评估转录，且很少包含明确的上下文输入。我们引入IndicContextEval，这是一个56小时的多语言基准，包含来自8种印度语言和23个专业领域的555位说话人的自然语音。我们设计了一个7级提示框架，逐步引入上下文信号，包括元数据、自然语言描述、英语和本地文字的实体列表，以及包含错误实体的对抗性提示。评估五个模型揭示了上下文利用行为的显著差异，凸显了对音频大语言模型中上下文基础进行显式评估的必要性。

英文摘要

AudioLLMs enable speech recognition conditioned on textual prompts such as domain descriptions or entity lists. However, it remains unclear whether these models genuinely utilise such context or rely on parametric knowledge learned during pretraining. Existing benchmarks cannot answer this question because they evaluate transcription under fixed prompting conditions and rarely include explicit contextual inputs. We introduce IndicContextEval, a 56-hour multilingual benchmark of natural speech from 555 speakers across 8 Indian languages and 23 professional domains. We design a 7-level prompting framework that progressively introduces contextual signals, including metadata, natural-language descriptions, entity lists in English and native script, and adversarial prompts with incorrect entities. Evaluating five models reveals substantial differences in context utilisation behaviour, highlighting the need for explicit evaluation of contextual grounding in AudioLLMs.

URL PDF HTML ☆

赞 0 踩 0

2606.19117 2026-06-18 stat.ME cs.LG econ.EM stat.ML 新提交

Wasserstein Policy Learning for Distributional Outcomes

Wasserstein 策略学习用于分布性结果

Yiyan Huang, Cheuk Hang Leung, Qi Wu, Zhiheng Zhang

AI总结针对分布值结果，提出基于Wasserstein重心和效用泛函的策略学习框架，使用IPW和DR估计器，证明遗憾率由策略类复杂度主导，并给出极小化下界。

Comments Accepted by The 39th Annual Conference on Learning Theory (COLT 2026)

详情

AI中文摘要

离线策略学习在因果推断中受到越来越多的关注。主要目标是学习一个策略（个体化治疗规则），作为从协变量到治疗的映射，以最大化定义为标量值潜在结果均值的经验福利。在本文中，我们研究具有分布值结果的离线策略学习，其中每个潜在结果是$\mathbb{R}$上的概率测度，奖励通过应用于诱导结果分布的Wasserstein重心的效用泛函来定义。我们基于逆概率加权（IPW）和双稳健（DR）估计器为策略学习框架建立了统计保证。通过处理组合策略类和无限维分位数域乘积上的具有挑战性的均匀偏差，我们证明了有限样本遗憾具有主导依赖$\widetilde{\mathcal{O}}(\sqrt{\mathrm{N\text{-}dim}(\Pi)/N})$。在一维Wasserstein设定下，并在所述正则条件下，主导遗憾率仍由策略类复杂度控制。此外，我们提供了一个极小化下界，建立了对$N$和$\mathrm{N\text{-}dim}(\Pi)$主导依赖的尖锐性。

英文摘要

Offline policy learning has received growing attention in causal inference. The primary objective is to learn a policy (individualized treatment rule) as a mapping from covariates to treatment that maximizes the empirical welfare defined as the mean of scalar-valued potential outcomes. In this paper, we study offline policy learning with distribution-valued outcomes, where each potential outcome is a probability measure on $\mathbb{R}$ and the reward is defined through a utility functional applied to the Wasserstein barycenter of induced outcome distributions. We establish statistical guarantees for the policy learning framework based on both Inverse Probability Weighting (IPW) and Doubly Robust (DR) estimators. By handling the challenging uniform deviation over the product of the combinatorial policy class and the infinite-dimensional quantile domain, we prove that the finite-sample regret has leading dependence $\widetilde{\mathcal{O}}(\sqrt{\mathrm{N\text{-}dim}(Π)/N})$. In the one-dimensional Wasserstein setting and under the stated regularity conditions, the leading regret rate is still governed by the policy-class complexity. Moreover, we provide a minimax lower bound establishing the sharpness of the leading dependence on $N$ and $\mathrm{N\text{-}dim}(Π)$.

URL PDF HTML ☆

赞 0 踩 0

2606.19101 2026-06-18 eess.SP cs.LG 新提交

Structure Over Nonlinearity: Explicit Interaction Architectures for Dynamical Learning

结构优于非线性：面向动力学学习的显式交互架构

Augusto Sarti

AI总结提出基于波启发交互结构的显式动力学单元，通过结构化组织而非非线性表达实现建模能力，在非线性系统辨识中深度提升表示质量与泛化性能。

Comments 11 pages, 2 figures, 2 tables

详情

AI中文摘要

大多数动力学系统的学习架构依赖于通用非线性函数逼近，通常需要高模型复杂度来捕获结构化行为。在这项工作中，我们提出了一种替代范式，其中建模能力主要来源于结构而非表达性非线性。我们引入了一类基于波启发交互结构和内部状态的显式结构化动力学单元。受波计算原理启发，所提出的单元采用严格的因果组织，消除了代数循环，产生无需隐式求解器即可评估的完全显式模型。堆叠此类单元可产生具有涌现层次行为的分层动力学架构。通过非线性系统辨识任务的实验，我们表明即使在有限的参数优化下，深度也能提高表示质量和泛化能力。特别地，所提出的架构即使在仅进行读出层拟合时也能产生信息丰富的内部表示，这表明有用的动力学结构在大量参数优化之前就已从交互的组织中涌现。这些结果表明，结构优先的设计为学习动力学系统提供了一种可行且有效的替代传统黑箱方法，突出了交互结构作为模型表达性主要来源的作用。

英文摘要

Most learning architectures for dynamical systems rely on generic nonlinear function approximation, often requiring high model complexity to capture structured behaviors. In this work, we propose an alternative paradigm in which modeling capability arises primarily from structure rather than from expressive nonlinearities. We introduce a class of explicit structured dynamical units based on wave-inspired interaction structures with internal state. Inspired by wave-based computational principles, the proposed units adopt a strictly causal organization that eliminates algebraic loops, yielding fully explicit models that can be evaluated without implicit solvers. Stacking such units produces layered dynamical architectures with emergent hierarchical behavior. Through experiments on a nonlinear system identification task, we show that depth improves both representation quality and generalization, even under limited parameter optimization. In particular, the proposed architectures produce informative internal representations even under readout-only fitting, indicating that useful dynamical structure emerges from the organization of interactions prior to substantial parameter optimization. These results suggest that structure-first design provides a viable and effective alternative to conventional black-box approaches for learning dynamical systems, highlighting the role of interaction structure as a primary source of model expressivity.

URL PDF HTML ☆

赞 0 踩 0

2606.19092 2026-06-18 stat.AP cs.LG 新提交

Context-Aware Optimization of Follow-Up Intervals for Type 2 Diabetes Care Using Markov Decision Processes

使用马尔可夫决策过程对2型糖尿病护理随访间隔进行上下文感知优化

Parisa Lotfibagha, Kristen Miller, William J. Gallagher, Elizabeth B. Selden, Muge Capan

AI总结提出上下文马尔可夫决策过程模型，利用电子健康记录数据为2型糖尿病患者优化个性化随访间隔，识别低风险和高风险亚群，相比固定间隔策略显著降低预期累积成本。

详情

AI中文摘要

慢性病管理依赖于定期的医患互动来跟踪疾病进展和控制。对于2型糖尿病，当前指南对所有患者规定固定的初级保健随访间隔，忽略了临床轨迹和患者特征的异质性。本研究引入上下文马尔可夫决策过程模型，利用来自10个初级保健诊所的22,154名2型糖尿病患者的电子健康记录数据，优化亚群特定的随访间隔决策。上下文通过以下方式识别：i) 利用主成分分析对代表个体健康轨迹的变量进行降维，以及ii) 通过主成分和额外的患者层面特征使用聚类将患者分配到上下文中。出现了两个不同的上下文，分别代表低风险和高风险亚群。CMDP导出的策略建议：(i) 如果当前就诊的实验室值未测量，则在1个月内随访；(ii) 对于实验室值升高或近期住院，最多3个月；(iii) 对于持续血糖控制，6至12个月，高风险上下文患者的随访间隔更短。最优策略实现了比基准更低的预期累积成本（例如，在高共病上下文中，相对于美国糖尿病协会类似的固定间隔随访策略，CMDP策略降低了约34.8%的成本；在低共病上下文中降低了约6.4%）。这些发现展示了上下文感知方法如何为适应性随访策略提供信息，并有可能通过综合机器学习和概率决策模型来推进初级保健中的慢性病管理。

英文摘要

Chronic disease management relies on regular patient-provider interactions to follow-up on disease progression and control. For Type 2 Diabetes (T2D), current guidelines prescribe fixed time intervals between subsequent primary care visits for all patients, overlooking heterogeneity in clinical trajectories and patient characteristics. This study introduces a Contextual Markov Decision Process (CMDP) model to optimize subpopulation-specific follow-up interval decisions using Electronic Health Record (EHR) data from 22,154 T2D patients across 10 primary care clinics. Contexts are identified by: i) dimensionality reduction of variables representing the individual health trajectories utilizing Principal Component Analysis, and ii) assigning patients to contexts via principal components and additional patient-level features using clustering. Two distinct contexts emerged, representing a lower- and a higher-risk subpopulation. CMDP-derived policies recommend: (i) follow-up within 1 month if lab value at current visit is unmeasured; (ii) up to 3 months for elevated lab values or recent hospitalizations; and (iii) 6 to 12 months for sustained glycemic control, with shorter follow-up intervals for patients in high-risk context. The optimal policies achieved lower expected cumulative cost than benchmarks (e.g., in the higher-comorbidity context, the CMDP policy reduced cost by about 34.8%, and in the lower-comorbidity context by about 6.4%, relative to an American Diabetes Association-like fixed interval follow-up policy. These findings demonstrate how context-aware approaches can inform adaptive follow-up strategies, and have the potential to advance chronic care management in primary care by synthesizing machine learning and probabilistic decision models.

URL PDF HTML ☆

赞 0 踩 0

2606.19081 2026-06-18 q-bio.NC cs.HC 新提交

Retrieval-Based Brain Decoding by Alignment, not Complexity

基于对齐而非复杂性的检索式脑解码

Matteo Ciferri, Matteo Ferrante, Nicola Toschi

AI总结本文通过跨多数据集实验证明，线性对比解码器在脑解码中优于岭回归和标准非线性方法，表明解码增益更多来自训练目标而非架构复杂性。

详情

AI中文摘要

认知科学中的一个著名理论认为，大脑中的概念被组织为高维向量，语义含义由该空间中的方向和相对角度捕获。脑解码是从神经活动中重建或检索刺激（或其表示）的努力，涉及找到一个近似大脑如何表示概念的函数。这激发了对对比目标作为逆转脑损失函数的生物合理候选者的研究。在这项工作中，我们研究了如何将功能磁共振成像（fMRI）活动与视觉、语言和音频基础模型的嵌入空间进行一般性映射。尽管神经计算在微观尺度上是高度非线性的，但fMRI测量平均了跨空间和时间的信号，并进一步被噪声平滑，从而有效地线性化了可观察的表示。与这些观点一致，我们在多个数据集上的实验表明，线性对比解码器始终优于岭回归和标准非线性替代方案，并且这些结果在图像、文本和声音中普遍适用。这些发现表明，解码增益更多地来自训练目标的选择而非架构复杂性，指向对比线性模型作为脑解码的原则性策略。

英文摘要

A prominent theory in cognitive science suggests that concepts in the brain are organized as high-dimensional vectors, with semantic meaning captured by directions and relative angles in this space. Brain decoding is the effort of reconstructing or retrieving stimuli (or their representations) from neural activity and involves finding a function that approximates how the brain represents concepts. This motivates the investigation of contrastive objectives as biologically plausible candidates to reverse the brain loss function. In this work, we study how functional MRI (fMRI) activity can generally be mapped with the embedding spaces of foundation models in vision, language, and audio. Although neural computations are highly non-linear at the microscale, fMRI measurements average signals across space and time, further smoothed by noise, effectively linearizing the observable representation. Consistent with these views, our experiments across multiple datasets demonstrate that linear contrastive decoders consistently outperform ridge regression and standard non-linear alternatives, and that these results generalize across images, text, and sound. These findings indicate that decoding gains arise more from the choice of training objective than from architectural complexity, pointing to contrastive-linear models as a principled strategy for brain decoding.

URL PDF HTML ☆

赞 0 踩 0

2606.19057 2026-06-18 stat.ML cs.LG stat.CO stat.ME 新提交

Quantifying and Auditing LLM Evaluation via Positive--Unlabeled Learning

通过正-无标签学习量化与审计大语言模型评估

Zilong Zhang, Yi-Ting Hung, Lei Ding, Chi-Kuang Yeh

AI总结针对大语言模型作为评估者存在的系统性偏差（如冗长偏好），提出基于部分最优传输的几何审计框架，利用少量人工验证正样本校正偏差，无需重训练即可提升与人类偏好的一致性。

详情

AI中文摘要

大语言模型（LLM）越来越多地被用作可扩展评估的评判者，然而这种LLM作为评判者的系统表现出与语义质量脱节的系统性偏差，最显著的是冗长偏差。同时，人工监督成本高昂且通常具有选择性，产生可靠的正向判断，但大多数输出未被标记且质量可能参差不齐。我们将选择性人工监督下的LLM评估形式化为一个正-无标签学习问题，并提出了一个基于部分最优传输的几何审计框架。通过在固定嵌入空间中将一小部分人工验证的正样本与可靠的无标签输出子集对齐，我们的方法识别出与人类一致的偏好，并在无需重新训练的情况下纠正有偏的评判者。实验表明，该方法提高了与人类偏好的一致性，增强了对呈现偏差的鲁棒性，并提供了可解释的置信度估计，为现有的LLM作为评判者流程提供了一种可扩展且统计上有依据的替代方案。

英文摘要

Large Language Models (LLMs) are increasingly used as judges for scalable evaluation, yet such LLM--as--a--Judge systems exhibit systematic biases that are decoupled from semantic quality, most notably verbosity bias. Meanwhile, human supervision is costly and typically selective, yielding reliable positive judgments but leaving most outputs unlabelled and potentially mixed in quality. We formulate LLM evaluation under selective human supervision as a positive--unlabelled learning problem and propose a geometric auditing framework based on Partial Optimal Transport. By aligning a small set of human--verified positives with a reliable subset of unlabelled outputs in a fixed embedding space, our method identifies human--consistent preferences and corrects biased judges without retraining. Experiments demonstrate improved alignment with human preferences, increased robustness to presentation biases, and interpretable confidence estimates, offering a scalable and statistically grounded alternative to existing LLM--as--a--judge pipelines.

URL PDF HTML ☆

赞 0 踩 0

2606.18993 2026-06-18 stat.ML cs.LG stat.ME 新提交

Sequential Kernel-based Conditional Independence Testing via Adaptive Betting

基于自适应投注的序列核条件独立性检验

Zheng He, Danica J. Sutherland

AI总结提出一种对估计误差更鲁棒的序列条件独立性检验方法，通过自适应优化核条件独立性统计量、归一化及截断平移校准，在合成与真实数据上控制第一类错误并保持高功效。

Comments Published at ICML 2026: https://openreview.net/forum?id=vUMdIyTs9c

详情

AI中文摘要

检验条件独立性是基础但本质上困难的问题：在没有额外假设的情况下，通常无法控制第一类错误。“Model-X”范式通过假设精确知道相关条件分布来解决这一困难。虽然经典的一次性检验有时可以容忍对该假设的小偏差，但现有的序列条件独立性检验通常要求精确知道Model-X条件分布，这使得当必须估计该分布时它们变得脆弱。我们提出了一种新方法，对这类估计误差具有更强的鲁棒性。我们的方法将测试-投注应用于自适应优化的核条件独立性统计量，并结合归一化方案和截断-移位校准策略。这些修改大大减少了第一类错误膨胀，同时在高维合成基准和现实世界公平性任务中保持了高功效，优于现有的序列Model-X方法。代码可在https://this URL获取。

英文摘要

Testing conditional independence is fundamental yet intrinsically difficult: without additional assumptions, Type I error control is impossible in general. The "Model-X'' paradigm addresses this difficulty by assuming exact knowledge of a relevant conditional distribution. While small deviations from this assumption can sometimes be tolerated in classical one-shot testing, existing sequential conditional independence tests typically require the Model-X conditional to be known exactly, making them fragile when it must instead be estimated. We propose a new approach that is substantially more robust to such estimation error. Our method applies testing-by-betting to an adaptively optimized Kernel Conditional Independence statistic, together with a normalization scheme and a truncate-and-shift calibration strategy. These modifications greatly reduce Type I error inflation while preserving high power across high-dimensional synthetic benchmarks and real-world fairness tasks, outperforming existing sequential Model-X approaches. Code is available at https://github.com/he-zh/SKCI.

URL PDF HTML ☆

赞 0 踩 0

2606.18979 2026-06-18 eess.AS cs.CL cs.SD 新提交

Mitigating Scoring Errors and Compensating for Nonverbal Subtests in Speech-Based Dementia Assessment

缓解语音痴呆评估中的评分错误并补偿非语言子测试

Franziska Braun, Christopher Witzl, Andreas Erzigkeit, Hartmut Lehfeld, Thomas Hillemacher, Tobias Bocklet, Korbinian Riedhammer

AI总结研究通过融合转录分数和Whisper嵌入减少语音评估中的评分错误，并利用融合表示近似专家整体评分以补偿缺失的运动子测试，有效区分认知状态组。

Comments Accepted at INTERSPEECH 2026

详情

AI中文摘要

认知障碍的早期检测依赖于神经心理学测试，通过评估多个认知领域来最小化主观性。基于语音的评估可以支持诊断并提高可及性，但转录错误和非语言子测试（如运动技能）的遗漏限制了准确性。除了传统的测试分数，语音衍生特征可以提供对认知状态的额外见解。本研究调查了德国“综合征短测试”的语音评估，这是一种标准化的痴呆筛查测试，包含语言和运动子测试。我们训练模型，整合每个语言子测试的转录衍生分数和Whisper嵌入，以减少评分错误。为了补偿缺失的运动子测试，我们利用这些融合表示来近似专家整体评分。尽管省略了子测试，我们的模型与专家评分高度相关，并能有效且准确地区分认知状态组。

英文摘要

Early detection of cognitive impairment relies on neuropsychological tests to minimize subjectivity by assessing multiple cognitive domains. Speech-based evaluation can support diagnostics and improve accessibility, but transcription errors and the omission of nonverbal subtests (e.g., motor skills) limit accuracy. Beyond conventional test scores, speech-derived features can provide additional insights into cognitive status. This study investigates the speech-based evaluation of the German "Syndrom-Kurz-Test," a standardized dementia screening test comprising verbal and motor subtests. We train models that integrate transcript-derived scores and Whisper embeddings per verbal subtest to reduce scoring errors. To compensate for missing motor subtests, we then leverage these fused representations to approximate expert overall ratings. Despite omitting subtests, our models strongly correlate with expert ratings and efficiently and accurately discriminate between cognitive status groups.

URL PDF HTML ☆

赞 0 踩 0

2606.18972 2026-06-18 stat.ML cs.LG 新提交

FOSC-X: An Extended Framework for Optimal Local Cuts and Non-Horizontal Cluster Selection from Clustering Hierarchies

FOSC-X: 一种用于从聚类层次结构中提取最优局部切割和非水平聚类的扩展框架

Connor Simpson, Ricardo J. G. B. Campello

AI总结提出FOSC-X框架，通过动态规划从层次聚类树中提取前M个全局最优的局部非水平切割聚类，支持聚类数约束，在线性时间内保证最优排序。

详情

AI中文摘要

从层次结构中提取平坦聚类解是实际聚类分析中的常见任务，可表述为优化问题。现有方法侧重于寻找单个最优解。我们引入FOSC-X，一个从层次聚类树的局部非水平切割中提取前M个全局最优平坦聚类的框架，同时可选地对聚类数量施加约束。这使得能够自动识别多个高质量替代聚类，捕捉层次结构的不同方面。无约束时，利用子树内局部最优部分候选可组合成全局最优解并自动确定聚类数的性质，通过动态规划在多项式时间内求解前M问题。然而，这可能导致聚类数最终不理想——例如，在特定应用领域中过大而失去意义或难以实际分析。施加聚类数约束破坏了无约束动态规划方法的最优性性质，因为局部最优部分候选可能不再能组合成可行的全局最优解。FOSC-X通过一种动态规划策略应对这一挑战，该策略使用可行性的下界和上界维护紧凑的可行候选集，同时剪枝不可行或占优的组合。所得方法保证在有无聚类数约束下，均以聚类节点数和数据集大小的线性时间复杂度获得前M个解的最优排序。实验表明，FOSC-X能有效揭示单解提取方法忽略的替代聚类结构。

英文摘要

Extracting a flat clustering solution from a hierarchy is a common task in practical cluster analysis and can be formulated as an optimisation problem. Existing approaches focus on finding a single optimal solution. We introduce FOSC-X, a framework for extracting the top-M globally optimal flat clusterings from local, non-horizontal cuts of a hierarchical cluster tree, while optionally enforcing constraints on the number of clusters. This enables automatic identification of multiple high-quality alternative clusterings that capture different aspects of the hierarchical structure. Without constraints, the top-M problem can be solved in polynomial time using dynamic programming, exploiting the property that locally optimal partial candidates within subtrees can be combined to form globally optimal solutions while automatically determining the number of clusters. However, this can lead to solutions with numbers of clusters that are ultimately undesirable -- e.g., too large to be meaningful or practically analysed within a particular application domain. Imposing cluster-count constraints breaks the optimality property underlying the unconstrained dynamic programming approach, since locally optimal partial candidates may no longer combine into feasible globally optimal solutions. FOSC-X addresses this challenge through a dynamic programming strategy that maintains compact sets of feasible candidates using lower and upper feasibility bounds while pruning infeasible or dominated combinations. The resulting method guarantees optimal rankings of the top-M solutions with linear-time complexity in the number of cluster nodes and dataset size, both with and without cluster-count constraints. Experiments show that FOSC-X efficiently reveals alternative clustering structures overlooked by single-solution extraction methods.

URL PDF HTML ☆

赞 0 踩 0

2606.18969 2026-06-18 stat.ME cs.MS stat.ML 新提交

Balanced Twins: Causal Inference on Time Series with Hidden Confounding

平衡双胞胎：存在隐藏混杂的时间序列因果推断

Ouali Maha, Ghattas Badih, Flachaire Emmanuel, Charpentier Philippe, Bozzi Laurent

AI总结提出神经框架同时学习个体时间序列的低维潜在表示和倾向得分，通过灵活匹配恢复反事实，估计处理组的平均处理效应，适用于交错干预和隐藏混杂场景。

详情

AI中文摘要

准确估计时间序列中的处理效应对于评估实际应用中的干预措施至关重要，尤其是当处理分配受到未观测因素的偏差影响时。在许多实际环境中，干预措施在不同时间点被不同个体采用，导致交错的处理暴露和异质性的处理前历史。在这种情况下，汇总处理单元的结果轨迹是不明确的，因此个体处理效应（ITE）估计成为可靠因果推断的前提。因此，我们通过首先恢复个体层面的反事实来研究估计处理组平均处理效应（ATT）的问题。我们引入了一个神经框架，同时学习个体时间序列的低维潜在表示和倾向得分。然后，这些估计通过一个灵活的匹配过程来近似个体处理效应，该过程避免了合成控制方法中常用的经典凸性约束。通过在个体层面操作，我们的方法自然地适应交错干预，并在潜在偏差下改进反事实估计，而不依赖于显式的时间建模假设。我们在实际能源消耗数据和临床时间序列上展示了我们的方法，包括高频电力需求响应项目和重症监护病房（ICU）个体的半合成数据，其中隐藏混杂、交错处理采纳和非平稳动态普遍存在。

英文摘要

Accurately estimating treatment effects in time series is essential for evaluating interventions in real-world applications, especially when treatment assignment is biased by unobserved factors. In many practical settings, interventions are adopted at different times across individuals, leading to staggered treatment exposure and heterogeneous pre-treatment histories. In such cases, aggregating outcome trajectories across treated units is ill-defined, making individual treatment effect (ITE) estimation a prerequisite for reliable causal inference. We therefore study the problem of estimating the average treatment effect for the treated (ATT) by first recovering individual-level counterfactuals. We introduce a neural framework that learns simultaneously low-dimensional latent representations of individual time series and propensity scores. These estimates are then used to approximate the individual treatment effects through a flexible matching procedure that avoids classical convexity constraints commonly used in synthetic control methods. By operating at the individual level, our approach naturally accommodates staggered interventions and improves counterfactual estimation under latent bias, without relying on explicit temporal modeling assumptions. We illustrate our approach on both real-world energy consumption data and clinical time series, including high-frequency electricity demand-response programs and semi-synthetic data for individuals in intensive care unit (ICU), where hidden confounding, staggered treatment adoption, and non-stationary dynamics are prevalent.

URL PDF HTML ☆

赞 0 踩 0

2606.18853 2026-06-18 stat.ML cs.LG 新提交

Kernel of Partition Paths: A Unified Representation for Tree Ensembles

划分路径的核：树集成的统一表示

Nicolas Mahler

AI总结提出KPP核，通过路径度量索引森林节点，统一了预测、精确加性归因、确定性Lipschitz鲁棒半径和Rademacher风险界，为树集成提供几何框架。

Comments 31 pages

详情

AI中文摘要

最近的一系列工作将单个决策树重新表述为基于其分裂的工程特征的线性模型，为oracle不等式和特征重要性重解释开辟了途径，但留下了一个开放问题：当通过节点而非分裂索引特征映射时，森林诱导的统一几何对象是什么。本文研究了该对象。KPP通过森林节点索引特征映射，并由路径度量加权，该度量将每个坐标转化为平方欧几里得路径等距嵌入的分量。KPP在承载度量的非对角Gram矩阵下统一了四个支柱：预测、精确加性归因、KPP度量下的确定性Lipschitz鲁棒半径，以及在固定、诚实或交叉拟合条件下的回归和分类的均匀Rademacher风险界。所有概率保证均以表示为条件，并在三种显式条件机制下陈述；鲁棒半径保证在KPP度量下是确定性的，而非原始输入的范数。回归和分类的快速率改进被推测为开放问题，并未声称是定理。

英文摘要

A recent line of work has reframed individual decision trees as linear models on engineered features associated with their splits, opening routes for oracle inequalities and feature-importance reinterpretation, but leaving open the question of what unified geometric object a forest induces when one indexes its feature map by nodes rather than by splits. The present paper studies that object. KPP indexes the feature map by the nodes of the forest, weighted by a path metric that turns each coordinate into a component of a squared-Euclidean path-isometric embedding. KPP unifies four pillars under a single non-diagonal Gram that carries a metric: prediction, exact additive attribution, deterministic Lipschitz robust radius in the KPP metric, and uniform Rademacher risk bounds for regression and classification under fixed, honest, or cross-fit conditioning. All probabilistic guarantees are conditional on the representation and are stated under three explicit conditioning regimes; the robust-radius guarantee is deterministic in the KPP metric rather than in a norm on the raw input. Conjectured fast-rate refinements for both regression and classification are stated as open problems and are not claimed as theorems.

URL PDF HTML ☆

赞 0 踩 0

2606.18750 2026-06-18 stat.AP cs.LG 新提交

Ensuring Trustworthy Online A/B Testing: Addressing Five Key Questions on CUPED

确保可信的在线A/B测试：解决关于CUPED的五个关键问题

Yu Zhang, Bokui Wan, Yongli Qin, Jinyong Ma, Yifan Guo

AI总结本文系统解决CUPED应用中五个常见但被忽视的问题，包括最优调整规范、回归调整有效性、鲁棒方差估计，并扩展到多臂实验和两阶段抽样设计，通过理论分析和实验验证提供可靠方法，已在字节跳动平台部署。

Comments 15 pages, 3 figures

详情

AI中文摘要

A/B测试已成为大规模在线实验中数据驱动决策的金标准，为功能发布、定价优化和用户体验提升提供关键指导。为最大化统计灵敏度，许多科技公司常规使用实验前数据控制实验（CUPED），该技术实现大幅方差缩减，同时保持平均处理效应估计的无偏性。尽管被广泛采用，CUPED的几个关键方法和实践细节仍未充分探索。本文系统解决了关于CUPED应用的五个常见但被忽视的问题。首先，我们提供各种后CUPED估计量的比较分析，以确定最优调整规范。其次，我们评估基于回归的调整的有效性，并描述为此类框架定制的鲁棒方差估计方法。最后，我们将研究扩展到复杂但常见的场景，包括多臂实验和两阶段抽样设计。我们的发现表明，在这些设置中，天真地依赖标准方差估计量可能导致严重误导的推断。通过提供严格的理论见解和广泛的实验验证，本工作加深了对CUPED的概念理解。值得注意的是，推荐的方法已成功部署并集成到字节跳动的实验平台中。

英文摘要

A/B testing has become the gold standard for data-driven decision-making in large-scale online experimentation, providing critical guidance for feature launch, pricing optimization, and user experience enhancement. To maximize statistical sensitivity, many technology companies routinely employ Controlled-experiment Using Pre-Experiment Data (CUPED), a technique that achieves substantial variance reduction while preserving the unbiasedness of estimating the average treatment effect. Despite its widespread adoption, several critical methodological and practical nuances of CUPED remain underexplored. This paper systematically addresses five frequently encountered yet overlooked questions regarding the application of CUPED. First, we provide a comparative analysis of various post-CUPED estimators to identify the optimal adjustment specification. Second, we evaluate the validity of regression-based adjustments and delineate robust variance estimation methods tailored for such frameworks. Finally, we extend our investigation to complex but common scenarios, including multi-arm experiments and two-stage sampling designs. Our findings reveal that in these settings, naive reliance on standard variance estimators can lead to severely misleading inferences. By offering rigorous theoretical insights and extensive experimental validation, this work deepens the conceptual understanding of CUPED. Notably, the recommended methodologies have been successfully deployed and integrated into ByteDance's experimentation platform.

URL PDF HTML ☆

赞 0 踩 0

2606.18734 2026-06-18 eess.SP cs.LG 新提交

Point-Cloud-Assistant Localized Statistical Channel Prediction by Tangent Gaussian Splatting

点云辅助的切线高斯溅射局部统计信道预测

Ye Xue, Yiheng Wang, Xinhua Shao, Qi Yan, Shutao Zhang, Tsung-Hui Chang

AI总结提出点云辅助切线高斯溅射（PC-TGS）框架，通过融合稀疏无线电测量与密集LiDAR几何数据，将角功率谱外推到未测量网格，实现大规模无线数字孪生中的高效信道预测。

详情

AI中文摘要

准确、特定地点的信道信息对于优化下一代无线网络至关重要。在各种方法中，局部统计信道建模（LSCM）通过从参考信号接收功率（RSRP）测量中建模信道多径角功率谱（APS），已成为一种针对高效网络优化的最先进方法。然而，尽管其有效性，LSCM无法在绝大多数没有测量值的位置预测APS，这严重限制了其在大规模真实场景中的适用性。为了解决这一挑战，我们提出了\emph{点云辅助切线高斯溅射}（PC-TGS），这是第一个通过将稀疏无线电测量与密集的基于LiDAR的几何信息相结合，将APS\emph{外推}到未测量室外网格的框架。PC-TGS将环境散射体表示为各向异性的3D高斯分布，通过原始点云的松弛均值重新参数化进行初始化和细化。切线平面投影将每个高斯分布精确映射到局部角度域，而深度感知的电磁溅射过程聚合它们的贡献。为了确保实际部署，我们推导了用于APS bin积分的闭式高斯加权平均（GWA），并提供了可证明的误差界。在LiDAR扫描的城市规模数据集（500万个点，6310个RSRP样本）上的评估表明，与最先进的基线相比，PC-TGS在APS和RSRP预测性能上更优，并且在外推APS任务中推理时间更快。这些结果突显了PC-TGS在大规模无线数字孪生中实现几何感知和数据高效信道预测的潜力。

英文摘要

Accurate, site-specific channel information is crucial for optimizing next-generation wireless networks. Among various approaches, localized statistical channel modeling (LSCM), which models the channel multipath angular power spectrum (APS) from the reference signal received power (RSRP) measurement, has emerged as a state-of-the-art method tailored for efficient network optimization. However, despite its effectiveness, LSCM cannot predict APS at the vast majority of locations where no measurements are available, which significantly restricts its applicability in large-scale, real-world scenarios. To address this challenge, we present \emph{point-cloud-assisted tangent Gaussian splatting} (PC-TGS), the first framework to \emph{extrapolate} APS to unmeasured outdoor grids by integrating sparse radio measurements with dense LiDAR-based geometry. PC-TGS represents environmental scatterers as anisotropic 3D Gaussians, initialized and refined through a relaxed-mean reparameterization of the raw point cloud. A tangent-plane projection accurately maps each Gaussian into the local angular domain, while a depth-aware electromagnetic splatting process aggregates their contributions. To ensure practical deployment, we derive a closed-form Gaussian-weighted average (GWA) for APS bin integration and provide a provable error bound. { Evaluations on a LiDAR-scanned city-scale dataset (5M points, 6,310 RSRP samples) demonstrate that PC-TGS achieves better APS and RSRP prediction performance compared to state-of-the-art baselines and faster inference time for APS extrapolation task. These results highlight the potential of PC-TGS to enable geometry-aware and data-efficient channel prediction in large-scale wireless digital twins.

URL PDF HTML ☆

赞 0 踩 0

2606.18729 2026-06-18 stat.ML cs.LG 新提交

TimeLAVA: Learning-Agnostic Data Valuation for Time Series

TimeLAVA: 时间序列的学习无关数据估值

Wenqin Liu, Weizhi Quan, Aoqi Zuo, Erdun Gao, Vu Nguyen, Dino Sejdinovic, Howard Bondell, Mingming Gong

发表机构 * School of Mathematics and Statistics, The University of Melbourne（墨尔本大学数学与统计学学院）； Statistics, The University of Melbourne（墨尔本大学统计学系）； Statistics, University of Sydney（悉尼大学统计学系）； Responsible AI Research Centre, Australian Institute for Machine Learning（澳大利亚机器学习研究所负责任人工智能研究中心）； Amazon（亚马逊）； School of Mathematical Sciences, Adelaide University（阿德莱德大学数学科学学院）； Department of Machine Learning, MBZUAI（MBZUAI机器学习系）

AI总结提出TimeLAVA，一种学习无关框架，通过小波变换和最优传输评估时间序列片段对分布差异的边际贡献，无需模型训练，在异常检测、数据剪枝和标签噪声检测中优于现有方法。

Comments 34pages

详情

Journal ref: ICML2026

AI中文摘要

数据估值量化单个样本的内在质量，以实现原则性的数据整理、质量控制和鲁棒学习。对于医疗、金融和工业监控等关键领域的时间序列，有效的估值方法至关重要但基本缺乏。现有方法要么依赖于模型，限制了其泛化性，要么针对独立同分布数据设计，因此无法捕捉序列数据固有的时间依赖性、多尺度模式和非平稳动态。我们引入了TimeLAVA，一种学习无关框架，通过评估时间片段对最小化评估数据与参考数据之间分布差异的边际贡献来估值。其核心是一种新颖的基于选择性小波的Wasserstein差异，结合了用于时间定位的多尺度小波变换和用于对分布偏移具有鲁棒性的非平衡最优传输。通过敏感性分析高效计算片段值，无需模型训练，并聚合成逐点得分。我们提供了将估值与模型无关泛化联系起来的理论保证，并证明了对异常值污染的有界敏感性。在异常检测、数据剪枝和标签噪声检测上的大量实验表明，TimeLAVA在多样化的真实世界数据集上产生了比现有方法显著更具信息量的价值分数。

英文摘要

Data valuation quantifies the intrinsic quality of individual samples to enable principled data curation, quality control, and robust learning. For time series in critical domains such as healthcare, finance, and industrial monitoring, effective valuation methods are essential yet fundamentally lacking. Existing approaches are either model-dependent, limiting their generalizability, or designed for i.i.d. data and thus fail to capture temporal dependencies, multi-scale patterns, and non-stationary dynamics inherent to sequential data. We introduce TimeLAVA, a learning-agnostic framework that values temporal segments by their marginal contribution to minimizing distributional discrepancy between evaluated and reference data. At its core is a novel Selective Wavelet-based Wasserstein discrepancy combining multi-scale wavelet transforms for temporal localization with unbalanced optimal transport for robustness to distributional shifts. Segment values are efficiently computed via sensitivity analysis without requiring model training and aggregated into point-wise scores. We provide theoretical guarantees linking valuation to model-agnostic generalization and prove bounded sensitivity to outlier contamination. Extensive experiments across anomaly detection, data pruning, and label noise detection demonstrate that TimeLAVA produces significantly more informative value scores than existing methods on diverse real-world datasets.

URL PDF HTML ☆

赞 0 踩 0

2606.18645 2026-06-18 eess.AS cs.AI 新提交

Augmenting Dysarthric Speech Severity Assessment with MOS Supervision

通过MOS监督增强构音障碍语音严重程度评估

Kaimeng Jia, Minzhu Tu, Zengrui Jin, Siyin Wang, Chao Zhang

发表机构 * Tsinghua University（清华大学）； Beijing University of Posts（北京邮电大学）

AI总结提出利用语音合成评估数据（QualiSpeech语料库的MOS标签）增强构音障碍语音评估，微调提升可懂度和自然度预测，联合训练主要提升自然度，减少对临床标注的依赖。

详情

AI中文摘要

构音障碍是一种以可懂度和交际有效性降低为特征的言语障碍。自动的构音障碍语音话语级评估可以支持可扩展的语音监测和治疗相关分析。然而，训练此类系统受到临床标注构音障碍语音稀缺的瓶颈限制。本工作提出利用语音合成评估数据，特别是来自QualiSpeech语料库的带有平均意见得分（MOS）标签的人工标注话语，来增强构音障碍语音评估。实验表明，在语音合成评估数据上微调持续提高了可懂度和自然度预测的性能，而联合训练主要在自然度上带来提升。这些结果表明，合成伪影和构音障碍语音共享感知共性，语音合成评估语料库提供了一种实用的增强来源，减少了对稀缺临床标注的依赖。

英文摘要

Dysarthria is a speech disorder marked by reduced intelligibility and communicative effectiveness. Automatic utterance-level assessment of dysarthric speech can support scalable speech monitoring and therapy-related analysis. Yet training such systems is bottlenecked by the scarcity of clinically annotated dysarthric speech. This work proposes to augment dysarthric speech assessment using data from speech synthesis evaluations, specifically human-annotated utterances with Mean Opinion Score (MOS) labels from the QualiSpeech corpus. Experiments show that fine-tuning on speech synthesis assessment data consistently improves performance on both intelligibility and naturalness prediction, while joint training yields gains primarily on naturalness. These results suggest that synthesis artifacts and dysarthric speech share perceptual commonalities, and speech synthesis evaluation corpora offer a practical augmentation source that reduces reliance on scarce clinical annotations.

URL PDF HTML ☆

赞 0 踩 0

2606.18574 2026-06-18 econ.TH cs.GT 新提交

Stable and Fair Random Allocations in a Two-Sided Discrete-Concave Market

双边离散凹市场中的稳定与公平随机分配

Kenzo Imamura, Yasushi Kawase

AI总结针对双边环境中随机分配存在的稳定性与公平性问题，本文利用离散凹（M^♮-凹）估值，证明了存在事前稳定且公平的分配，并通过Birkhoff–von Neumann定理的推广，将事前稳定分数分配分解为稳定确定性分配的彩票。

Comments Appears in the Twenty-Seventh ACM Conference on Economics and Computation (EC'26)

详情

AI中文摘要

随机分配被广泛用于处理双边环境中的平局和无差异。在这种环境中，常用的程序如随机破平可能无法从事前角度确保稳定性和公平性。我们证明，当代理人具有离散凹（M$^\ atural$-凹）估值时，存在事前稳定且公平的分配。为了建立这一结果，我们将我们的框架与Alkan和Gale引入的稳定性模型联系起来。特别地，我们证明事前稳定且公平的分数分配恰好被刻画为在由凹闭包诱导的选择函数下，结合对称严格凸破平规则的Alkan–Gale稳定结果。我们进一步证明，任何事前稳定的分数分配都可以通过Birkhoff–von Neumann定理的推广，分解为稳定确定性分配的彩票。最后，我们研究了一个不依赖基数估值而假设序数偏好的设定。在这个序数框架内，我们建立了事前稳定且公平的分数分配的存在性。该设定在拟阵约束下的带合同匹配框架中表述。由此产生的类包括现有模型，例如具有响应选择对应的一对多随机分配，并涵盖了广泛的应用，包括带有彩票的受控学校选择。

英文摘要

Random allocations are widely used to handle ties and indifferences in two-sided environments. In such environments, commonly used procedures such as random tie-breaking may fail to ensure stability and fairness from an ex ante perspective. We show that when agents have discrete concave (M$^\natural$-concave) valuations, there exists an ex ante stable and fair allocation. To establish this result, we relate our framework to the model of stability introduced by Alkan and Gale. In particular, we show that ex ante stable and fair fractional allocations are exactly characterized as Alkan--Gale stable outcomes under choice functions induced from concave closures together with a symmetric strictly convex tie-breaking rule. We further prove that any ex ante stable fractional allocation can be decomposed into a lottery over stable deterministic allocations, using a generalization of the Birkhoff--von Neumann theorem. Finally, we study a setting that does not rely on cardinal valuations and instead assumes ordinal preferences. Within this ordinal framework, we establish the existence of an ex ante stable and fair fractional allocation. This setting is formulated within the matching-with-contracts framework under matroid constraints. The resulting class includes existing models, such as one-to-many random allocation with responsive choice correspondences, and captures a wide range of applications, including controlled school choice with lotteries.

URL PDF HTML ☆

赞 0 踩 0

2606.18567 2026-06-18 stat.ML cs.LG stat.AP stat.ME 新提交

Bridging Data Gaps in Structural Fragility Modeling through Transfer Learning: Methodology and Case Studies

通过迁移学习弥合结构易损性建模中的数据空白：方法与案例研究

Narges Saeednejad, Jamie Ellen Padgett

发表机构 * Department of Civil and Environmental Engineering, Rice University（Rice大学土木与环境工程系）； Ken Kennedy Institute, Rice University（Rice大学肯尼迪研究所）

AI总结提出以方法为中心的迁移学习框架，解决领域偏移、类别不平衡和目标标签稀缺问题，通过三个案例验证其在低数据场景下提升失效检测与预测稳定性的有效性。

Comments 24 pages, 12 figures

详情

AI中文摘要

本文提出了一个以方法为中心的迁移学习框架，用于在领域偏移、类别不平衡和目标标签稀缺的情况下进行易损性自适应，同时保持工程可解释性并支持不确定性下的决策。通过三个互补的案例研究展示了四种迁移学习策略（基于实例、基于参数、分层贝叶斯和多源）：(i) 基于实例的迁移学习通过重要性加权，利用卡特里娜飓风观测数据演示了沿海桥梁易损性；(ii) 基于参数的迁移学习结合分层贝叶斯迁移学习，实现了跨层的部分合并和后验不确定性量化，利用伊恩飓风观测数据演示了住宅建筑易损性；(iii) 多源迁移学习融合多个分析易损性模型，学习源权重并进行正则化的目标域自适应，利用2001年尼斯夸利地震观测数据演示了地震桥梁易损性。在这些案例研究中，直接迁移源模型（即使用现有最先进模型）在领域偏移和严重类别不平衡下失败，而有针对性的自适应在低数据场景下显著提高了失效检测和预测稳定性。这些发现强调了在开发和自适应易损性模型时，需要对诊断、策略选择和不确定性报告提供系统指导。

英文摘要

This paper presents a methodology-centered transfer learning framework for fragility adaptation under domain shift, class imbalance, and scarce target labels while preserving engineering interpretability and supporting decision-making under uncertainty. Four transfer learning strategies (instance-based, parameter-based, hierarchical Bayesian, and multi-source) are demonstrated through three complementary case studies: (i) instance-based transfer learning via importance weighting, demonstrated on coastal bridge fragility using Hurricane Katrina observations; (ii) parameter-based transfer learning together with hierarchical Bayesian transfer learning, enabling partial pooling across strata and posterior uncertainty quantification, demonstrated on residential building fragility using Hurricane Ian observations; and (iii) multi-source transfer learning that fuses multiple analytical fragility models with learned source weights and regularized target-domain adaptation, demonstrated on seismic bridge fragility using observations from the 2001 Nisqually earthquake. Across these case studies, direct transfer of source models (i.e. using existing state-of-the-art models) fails under domain shift and severe class imbalance, while targeted adaptation substantially improves failure detection and predictive stability in low-data regimes. These findings highlight the need for systematic guidance on diagnostics, strategy selection, and uncertainty reporting when developing and adapting fragility models.

URL PDF HTML ☆

赞 0 踩 0

2606.18536 2026-06-18 stat.AP cs.SE 新提交

Analytics for Quality Assurance for Item Pools (AQuAP): Monitoring and Maintaining Item Bank Health in AI-Driven Assessment Systems

题库质量保证分析（AQuAP）：AI驱动评估系统中题库健康的监控与维护

Alina A. von Davier, Xiaowan Zhang, Yigal Attali, Yena Park, Jacqueline Church, Andrew Runge, Geoff T. LaFlair, Alexander Tsigler

AI总结提出AQuAP仪表盘环境，通过有效题库规模等指标监控题库质量，支持大规模自动与人工结合的试题开发，确保高利害测试的题库健康。

Comments 11 pages, 4 figures

详情

AI中文摘要

教育评估的大规模数字化使得题库的持续监督既必要又复杂。本文提出了题库质量保证分析（AQuAP），一个用于监控试题质量和题库健康的仪表盘环境。AQuAP支持高利害测试中大规模试题生成程序的操作实施，这些程序包含在试题工厂（一个自动化和人工支持的测试开发框架）中。本文描述了AQuAP与试题开发过程的关系，概述了题库质量保证的更广泛度量框架，并强调了有效题库规模（EBS）作为题库活力的核心指标。EBS量化了在内容重复发生之前可以构建的独立测试会话数量，当与曝光度和使用度量结合时，它提供了对题库安全性、多样性和效率的洞察。我们进一步引入了题库健康度量，如最大曝光度、最大条件曝光度、调整后的有效题库规模和极少施测比例，所有这些都扩展了试题利用情况的图景。AQuAP展示了操作分析如何将心理测量概念转化为高容量、AI驱动的测试程序的质量保证工具。本文以多邻国英语测试（DET）流程为例进行说明。

英文摘要

The large-scale digitization of educational assessment has made the continuous oversight of item banks both essential and complex. This paper presents Analytics for Quality Assurance for Item Pools (AQuAP), a dashboard environment for monitoring item quality and item bank health. AQuAP supports the operational implementation of the large scale item generation procedures for high-stakes tests as included in the Item Factory, a framework for automated and human-supported test development. The paper describes AQuAP in relationship with the process of item development, outlines the broader metric framework for item-pool quality assurance, and highlights the Effective Bank Size (EBS) as one central indicator of pool vitality. EBS quantifies how many independent test sessions can be constructed before content repetition occurs and, when coupled with exposure and usage metrics, provides insight into item bank security, diversity, and efficiency. We further introduce bank-health metrics, such as maximum exposure, maximum conditional exposure, adjusted effective bank size, and the rarely-administered fraction, all of which extend this picture of item utilization. AQuAP illustrates how operational analytics can translate psychometric concepts into quality assurance tools for high-volume, AI-enabled testing programs. This work is illustrated with the Duolingo English Test (DET) processes.

URL PDF HTML ☆

赞 0 踩 0

2606.18531 2026-06-18 stat.ML cs.LG 新提交

When Does Trajectory-Level Supervision Permit Efficient Offline Reinforcement Learning?

轨迹级监督何时允许高效的离线强化学习？

Xuanfei Ren, Tengyang Xie

发表机构 * University of Wisconsin-Madison（威斯康星大学麦迪逊分校）

AI总结本文研究离线强化学习中仅使用轨迹级结果（如累积回报或偏好）进行策略优化的统计理论，提出OPAC算法并证明其样本复杂度，同时揭示在非线性聚合目标下存在的统计障碍。

Comments 69 pages

详情

AI中文摘要

离线强化学习通常在过程级奖励监督下进行分析，然而许多序列决策数据集仅记录轨迹级结果。我们发展了从这种结果级监督进行离线策略优化的统计理论。首先研究规范设置，其中目标仍是期望累积奖励，但每个离线轨迹仅提供一个标量标签，其条件均值是累积回报。我们提出OPAC，一种悲观演员-评论家算法，它学习潜在奖励模型并从轨迹级标签优化策略。我们证明了阶为$\widetilde O(H^2\sqrt{C_{sa}(\pi^\star)/n})$的高概率保证和匹配的下界，刻画了用单个轨迹级标签替代过程级奖励的尖锐统计代价。然后我们将该原理扩展到基于偏好的反馈，在偏好模型常数范围内保留了领先的视界和可集中性依赖。最后，我们研究广义基于结果的离线强化学习，其中监督和目标都是由潜在每步奖励的非线性聚合引起的轨迹级量。该问题通常不可学习：对于全成功目标，即使具有确定性转移和常数可集中性，任何离线学习器可能需要$\Omega(2^H)$个轨迹。然后我们通过两个结构系数$\kappa_\mu(\sigma)$和$\chi_\mu(\sigma)$识别出一个可处理的区域，这两个系数捕捉了结果聚合和广义贝尔曼更新中的信息损失，在此区域广义OPAC实现了多项式样本复杂度。我们的结果共同描绘了何时结果级监督能够实现样本高效的离线控制，以及何时缺失过程级奖励会带来根本性的统计障碍。

英文摘要

Offline reinforcement learning is typically analyzed under process-level reward supervision, yet many sequential decision datasets record only trajectory-level outcomes. We develop a statistical theory for offline policy optimization from such outcome-level supervision. We first study the canonical setting where the target remains the expected cumulative reward, but each offline trajectory provides only a scalar label whose conditional mean is the cumulative return. We propose OPAC, a pessimistic actor-critic algorithm that learns a latent reward model and optimizes a policy from trajectory-level labels. We prove a high-probability guarantee of order $\widetilde O(H^2\sqrt{C_{sa}(π^\star)/n})$ and a matching lower bound, characterizing the sharp statistical cost of replacing process-level rewards with one trajectory-level label. We then extend the principle to preference-based feedback, preserving the leading horizon and concentrability dependence up to preference-model constants. Finally, we study generalized outcome-based offline RL, where both the supervision and the objective are trajectory-level quantities induced by a nonlinear aggregation of latent per-step rewards. This problem is not learnable in general: for all-success objectives, any offline learner may require $Ω(2^H)$ trajectories even with deterministic transitions and constant concentrability. We then identify a tractable regime through two structural coefficients, $κ_μ(σ)$ and $χ_μ(σ)$, capturing information loss in outcome aggregation and generalized Bellman updates, under which generalized OPAC achieves polynomial sample complexity. Together, our results delineate when outcome-level supervision enables sample-efficient offline control and when missing process-level rewards create fundamental statistical barriers.

URL PDF HTML ☆

赞 0 踩 0

2606.18527 2026-06-18 stat.ML cs.LG 新提交

Toward Simultaneously Optimal Regret in U-Calibration

面向同时最优遗憾的U-校准

Rafael Frongillo, Haipeng Luo, Nishant A. Mehta, Jon Schneider

发表机构 * University of Colorado Boulder（科罗拉多大学波德穆尔分校）； University of Southern California（南加州大学）； Google Research（谷歌研究）

AI总结提出一种基于自和谐噪声的FTPL变体，实现对所有有界适当损失的最优$\tilde O(\sqrt{T})$遗憾和对光滑损失的对数遗憾。

Comments 30 pages; to appear at COLT 2026

详情

AI中文摘要

U-校准研究在线预测算法，其预测可被任何未知下游智能体使用，同时保证对所有适当损失函数的次线性遗憾。现有U-校准算法对每个有界适当损失实现了最坏情况最优的$O(\sqrt{T})$遗憾，但它们未能适应更简单的损失：如我们所示，即使对于平方损失等光滑损失，它们也会产生$\Omega(\sqrt{T})$遗憾，而不是最优的$O(\log T)$遗憾。在这项工作中，我们表明这一局限性并非固有。具体来说，我们设计了一个单一的预测算法，同时对所有有界适当损失实现$\tilde O(\sqrt{T})$遗憾，并对所有有界光滑适当损失实现$O(\log T)$遗憾。更一般地，我们的算法还对于相对于对数障碍光滑的损失（包括几个非Lipschitz例子）实现了对数遗憾。我们的方法基于一种新颖的跟随扰动领导者（FTPL）变体，其中使用自和谐噪声直接在预测空间中应用扰动。由于这种噪声的复杂性质，所得分析也大大偏离了先前的FTPL分析，可能具有独立意义。

英文摘要

U-calibration studies online forecasting algorithms whose predictions can be consumed by any unknown downstream agent, guaranteeing sublinear regret simultaneously for all proper loss functions. Existing U-calibration algorithms achieve worst-case optimal $O(\sqrt{T})$ regret for every bounded proper loss, but they fail to adapt to easier losses: as we show, even for smooth losses such as squared loss, they incur $Ω(\sqrt{T})$ regret instead of the optimal $O(\log T)$ regret. In this work, we show that this limitation is not inherent. Specifically, we design a single forecast algorithm that simultaneously achieves $\tilde O(\sqrt{T})$ regret for every bounded proper loss and $O(\log T)$ regret for every bounded smooth proper loss. More generally, our algorithm also attains logarithmic regret for losses that are smooth relative to the log-barrier, which include several non-Lipschitz examples. Our approach is based on a novel variant of Follow-the-Perturbed-Leader (FTPL) in which perturbations are applied directly in the prediction space using self-concordant noise. The resulting analysis also departs substantially from prior FTPL analyses due to the complex nature of this noise and may be of independent interest.

URL PDF HTML ☆

赞 0 踩 0

2606.18523 2026-06-18 q-bio.QM cs.CV 新提交

DART: A design-aware microfluidic chip paradigm for real-time live-cell image analysis

DART: 一种设计感知的微流控芯片范式用于实时活细胞图像分析

Johannes Seiffarth, Matthias Pesch, Lukas Scholtes, Dietrich Kohlheyer, Hanno Scharr, Katharina Nöh

发表机构 * Institute for Bio- and Geosciences, IBG-1: Biotechnology（生物与地质科学研究所，IBG-1：生物技术）； Computational Systems Biotechnology (AVT.CSB), RWTH Aachen University（计算系统生物技术（AVT.CSB），亚琛工业大学）； Institute for Advanced Simulation, IAS-8: Data Analytics and Machine Learning（先进模拟研究所，IAS-8：数据分析与机器学习）

AI总结提出DART范式，通过嵌入式标记和深度学习检测对齐CAD蓝图与物理芯片，实现高通量微流控芯片中所有感兴趣区域的快速定位和全自动图像处理，支持实时分析。

详情

AI中文摘要

高通量微流控活细胞成像产生丰富的单细胞数据。然而，用于定位每个包含一个细胞群体的感兴趣区域（RoI）并从记录图像中移除周围微流控结构的半自动化流程随RoI数量扩展，这阻碍了实时图像分析并将洞察时间延迟数小时至数天。我们提出了用于微流控培养芯片的设计感知和实时能力（DART）范式，该范式将CAD蓝图与物理芯片对齐，从而实现了对所有RoI的通量无关定位以及跨不同RoI几何形状和芯片布局的全自动图像处理。DART通过嵌入式基准标记和基于深度学习的标记检测建立这种对齐。我们使用瑞士军刀芯片验证DART，该芯片在1164个RoI位置上组合了八种结构不同的RoI设计。DART在五分钟内定位所有RoI，在40毫秒内从原始显微镜图像中移除微流控结构，并在每张图像1.1秒内执行全自动图像分析，包括细胞分割。这些能力共同使DART成为一个端到端的硬件-软件范式，具有实时分析能力，为闭环和结果驱动的智能显微镜铺平了道路。

英文摘要

High-throughput microfluidic live-cell imaging generates rich single-cell data. Yet semi-automated procedures for locating regions of interest (RoIs), each containing one cell population, and removing surrounding microfluidic structures from recorded images, scale with the number of RoIs. This prevents real-time image analysis and delays time-to-insight by hours to days. We introduce the Design-Aware and Real-Time capable (DART) paradigm for microfluidic cultivation chips, which aligns the CAD blueprint with the physical chip and thereby enables throughput-independent localization of all RoIs and fully automated image processing across diverse RoI geometries and chip layouts. DART establishes this alignment through embedded fiducial markers and deep-learning-based marker detection. We validate DART using the Swiss Army Knife chip, which combines eight structurally distinct RoI designs across 1164 RoI locations. DART localizes all RoIs in five minutes, removes microfluidic structures from raw microscopy images in 40 ms, and performs fully automated image analysis, including cell segmentation, in under 1.1 s per image. Together, these capabilities establish DART as an end-to-end hardware-software paradigm with real-time-capable analysis that paves the way toward closed-loop and outcome-driven smart microscopy.

URL PDF HTML ☆

赞 0 踩 0

2606.18520 2026-06-18 stat.ML cs.CG cs.CL cs.DS cs.IR cs.LG 新提交

Compact Geometric Representations of Hierarchies

层次结构的紧凑几何表示

Prashant Gokhale, Piotr Indyk, Yuhao Liu, Sandeep Silwal, Tony Chang Wang, Haike Xu

发表机构 * UW-Madison（威斯康星大学麦迪逊分校）； MIT（麻省理工学院）

AI总结研究如何用低维几何嵌入表示有向无环图中的祖先-后代关系，提出基于树宽等结构参数的维度上界和下界，并在真实数据集上验证了紧凑性。

Comments Published at the 39th Annual Conference on Learning Theory (COLT) 2026. 22 Pages

详情

AI中文摘要

计算数据的几何表示是现代机器学习的基石，通常通过训练双编码器将查询和文档映射到共享嵌入空间来实现。You等人[NeurIPS '25]的最新工作将这种方法扩展到层次检索，其中相关性由有向无环图（DAG）中的祖先-后代关系决定。虽然先前的工作表明当后代数量较少时存在有效嵌入，但这些界限对于深层层次结构会严重退化，所需维度与节点总数相当。在本文中，我们研究了更一般图类的紧凑可达性嵌入，并提供了使用维度依赖于结构图参数的嵌入来表示层次结构的理论保证。我们证明，对于任何有向树，存在常数维度3的可达性嵌入，与树的大小或深度无关。我们将这一结果推广到以树宽$t$为特征的图，构造了维度为$O(t \log n)$的嵌入，其中$n$是节点数。作为这些上界的补充，我们提供了匹配或接近匹配的下界，表明对于一般DAG，维度$\Omega(n)$是必要的，而对于树宽为$t$的图，需要$\Omega(t/\log(n/t))$的维度。我们还获得了由DAG中交叉边数量参数化的上界和下界。此外，我们展示了我们的嵌入可以在真实世界数据集上构建，并且与先前具有理论保证的嵌入相比，在高召回率情况下维度小得多。

英文摘要

Computing geometric representations of data is a cornerstone of modern machine learning, typically achieved by training dual encoders which map queries and documents into a shared embedding space. Recent work of You et al. [NeurIPS '25] has extended this approach to hierarchical retrieval, where relevance is determined by the ancestor-descendant relationships in a Directed Acyclic Graph (DAG). While previous work has shown that valid embeddings exist when the number of descendants is small, these bounds degrade significantly for deep hierarchies, requiring dimensions as large as the total number of nodes. In this paper, we investigate compact reachability embeddings for more general graph classes and provide theoretical guarantees for representing hierarchies using embeddings whose dimension depends on structural graph parameters. We prove that for any directed tree, there exists a reachability embedding in constant dimension 3, independent of the tree's size or depth. We generalize this result to graphs characterized by treewidth $t$, constructing embeddings of dimension $O(t \log n)$, where $n$ is the number of nodes. Complementing these upper bounds, we provide matching or near-matching lower bounds, showing that dimension $Ω(n)$ is necessary for general DAGs and $Ω(t/\log(n/t))$ is required for graphs of treewidth $t$. We also obtain upper and lower bounds parameterized by the number of cross-edges in the DAG. We additionally show that our embeddings can be constructed on real world datasets, and that they give much smaller dimensions in high recall regimes compared to prior embeddings with theoretical guarantees.

URL PDF HTML ☆

赞 0 踩 0

2606.18480 2026-06-18 eess.AS cs.SD 新提交

Generalised Transcoding Framework for Arbitrary Spatial Audio Capture and Playback Formats

任意空间音频采集与回放格式的通用转码框架

Archontis Politis, Janani Fernandez, Leo McCormack

发表机构 * Faculty of Information Technology and Communication Sciences, Tampere University（信息科技与通讯科学学院，塔尔库大学）； Department of Information and Communications Engineering, Aalto University（信息与通讯工程系，阿尔托大学）

AI总结提出一种统一框架，通过估计时频域空间元数据（包括主成分和环境成分的角功率分布），实现从Ambisonic或原始麦克风阵列信号到任意目标回放格式的转码，支持独立旋转，实验证明其优于现有参数化渲染器。

Comments This work has been submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processing for possible publication

详情

AI中文摘要

本文介绍了一种统一框架，用于对以Ambisonic信号或原始麦克风阵列信号形式捕获的空间声场景进行参数化分析和再现。所提出的方法估计时频相关的空间元数据，该元数据表征可变数量的主源分量和具有自身角功率分布的环境分量，其参数拟合捕获信号的观测空间协方差。该元数据用于构建目标回放格式的空间协方差，然后用于推导最优混合矩阵，以将场景转码用于目标再现系统上的回放。该方法还独立处理采集和回放设置的旋转。在听力测试中，使用来自Ambisonic、球形和头戴式阵列的模拟场景，比较了该方法的实时实现和其他现有的最先进参数化渲染器。结果突出了所提出框架在多种内容和接收器配置下的感知优势，特别是对于低阶和几何约束的麦克风阵列。

英文摘要

This article introduces a unified framework for the parametric analysis and reproduction of spatial sound scenes captured either as Ambisonic signals or as raw microphone array signals. The proposed method estimates time-frequency-dependent spatial metadata that characterises a variable number of primary source components and an ambience component with its own angular power distribution, whose parameters fit the observed spatial covariances of the captured signals. This metadata is used to construct spatial covariances of the target playback formats, which are then used to derive optimal mixing matrices for transcoding the scene for playback over the target reproduction system. The method additionally handles independent rotations of both capture and playback setups. Real-time implementations of the method and other existing state-of-the-art parametric renderers are compared in a listening test using simulated scenes from Ambisonic, spherical, and head-worn arrays. The results highlight perceptual benefits of the proposed framework across a diverse range of content and receiver configurations, particularly for lower-order and geometrically constrained microphone arrays.

URL PDF HTML ☆

赞 0 踩 0

2606.18467 2026-06-18 stat.ML cs.LG 新提交

ToolChain-CRC: Conformal Risk Control for Agentic AI Under Retrieval and Tool-Use Drift

ToolChain-CRC: 检索与工具使用漂移下代理型AI的共形风险控制

Jeffery Opoku, David Banahene

发表机构 * The University of Texas Rio Grande Valley（德克萨斯大学里奥格兰德谷分校）； Florida International University（佛罗里达国际大学）

AI总结针对检索增强和工具使用代理在漂移下的风险控制问题，提出ToolChain-CRC方法，通过构建轨迹级风险评分并校准接受或干预规则，实现可证明的轨迹级风险控制。

Comments 26 pages, 11 figures

详情

AI中文摘要

现代AI代理检索文档、调用工具、检查中间信息，然后产生最终答案或行动。这产生了一个仅从最终答案无法察觉的风险控制问题。即使检索薄弱、工具输出错误或早期步骤缺乏支持，最终响应也可能看起来可接受。我们提出ToolChain-CRC，一种针对漂移下检索增强和工具使用代理的共形风险控制方法。该方法将每次代理运行视为动作、观察和最终输出的完整轨迹。它构建步骤级风险评分，将其组合成轨迹风险评分，校准接受或干预规则，并添加一个随时报警，可在最终答案前停止风险运行。我们在可交换校准运行下证明了轨迹级风险控制，给出了具有可审计常数的漂移感知扩展，并通过超鞅构造证明了随时升级规则。实验涵盖合成工具链漂移、RAG/工具使用压力测试、基于SQuAD的公共检索任务、无API代理问答案例研究、消融实验、目标风险敏感性检查、20种子鲁棒性检查、漂移边界审计以及实时RAG/工具使用代理基准。在这些设置中，仅基于最终答案的校准可能遗漏检索和工具故障，而轨迹级校准将接受轨迹的风险保持在目标之下。

英文摘要

Modern AI agents retrieve documents, call tools, check intermediate information, and then produce a final answer or action. This creates a risk-control problem that is not visible from the final answer alone. A final response may look acceptable even when the retrieval was weak, a tool output was wrong, or an earlier step was unsupported. We propose ToolChain-CRC, a conformal risk-control method for retrieval-augmented and tool-using agents under drift. The method treats each agent run as a full trajectory of actions, observations, and final output. It builds step-level risk scores, combines them into a trajectory risk score, calibrates an accept-or-intervene rule, and adds an anytime alarm that can stop risky runs before the final answer. We prove trajectory-level risk control under exchangeable calibration runs, give a drift-aware extension with auditable constants, and prove an anytime escalation rule through a supermartingale construction. Experiments cover synthetic tool-chain drift, RAG/tool-use stress tests, public SQuAD-derived retrieval tasks, an API-free agentic QA case study, ablations, target-risk sensitivity checks, 20-seed robustness checks, a drift-margin audit, and a live RAG/tool-use agent benchmark. Across these settings, final-answer-only calibration can miss retrieval and tool failures, while trajectory-level calibration keeps accepted-trajectory risk below the target.

URL PDF HTML ☆

赞 0 踩 0

2606.18436 2026-06-18 stat.ML cs.LG 新提交

Pointwise is Pointless? A Multimodal Ablation Study for Precipitation Nowcasting with Graph Neural Networks

逐点是否无意义？基于图神经网络的降水临近预报的多模态消融研究

Ophélia Miralles, Máté Mile, Christoffer Artturi, Thomas Nipen, Ivar Seierstad

发表机构 * Norwegian Meteorological Institute（挪威气象研究所）

AI总结本研究通过多模态图神经网络系统，消融分析雷达、数值预报、地面观测、卫星数据及训练损失对降水临近预报的影响，发现各模态分别改善不同方面，点观测虽提升局部但需结合损失函数和不确定性表示才能优化雷达场。

详情

AI中文摘要

稀疏点观测在降水临近预报中日益可用，但尚不清楚它们能在多大程度上改善密集雷达场预报。我们通过北欧雷达区域的多模态图神经网络临近预报系统部分回答了这个问题。该模型预测未来两小时内每五分钟的降雨率，并采用雷达历史、MEPS数值天气预报、Netatmo地面观测、MSG卫星通道、随机噪声和基于CRPS的集合损失的不同组合进行训练。本研究设计为对操作相关信源和训练目标的消融。我们比较了仅雷达、NWP信息、站点信息、卫星信息、噪声增强和基于CRPS的配置，使用雷达网格、站点位置、降雨起始的互补诊断，以及oracle、位移和幅度评分。结果表明，每个信源改善了预报问题的不同方面。MEPS稳定了仅雷达外推，Netatmo观测改善了局部站点和起始诊断，卫星预测因子减少了某些站点级偏差，但在确定性使用时可能过早激活降雨。基于CRPS的配置提供了最一致的雷达网格增益，而卫星与CRPS的组合设置给出了最佳的整体oracle/DAS评分。这些结果不支持点观测对临近预报无用的结论，但表明局部观测技能和空间相干雷达场技能是不同的目标。实际意义是，稀疏观测可以提供有用的局部约束，但它们对雷达类场的益处取决于训练损失、不确定性表示以及观测支持在模型中的编码方式。

英文摘要

Sparse point observations are increasingly available for precipitation nowcasting, but it is unclear how much they improve dense radar-field forecasts. We partially address this question with a multimodal graph neural network nowcasting system over the Nordic radar domain. The model predicts rain rate every five minutes up to two hours ahead and is trained with different combinations of radar history, MEPS numerical weather prediction, Netatmo surface observations, MSG satellite channels, stochastic noise, and CRPS-based ensemble losses. The study is designed as an ablation of operationally relevant information sources and training objectives. We compare radar-only, NWP-informed, station-informed, satellite-informed, noise-augmented, and CRPS-based configurations using complementary diagnostics on the radar grid, at station locations, for rain onset, and through oracle, displacement, and amplitude scores. The results show that each source improves a different part of the forecast problem. MEPS stabilises radar-only extrapolation, Netatmo observations improve local station and onset diagnostics, and satellite predictors reduce some station-level biases but may activate rain too early when used deterministically. CRPS-based configurations provide the most consistent radar-grid gains, while the combined satellite and CRPS setup gives the best overall oracle/DAS score. These results do not support the conclusion that point observations are uninformative for nowcasting, but they show that local observational skill and spatially coherent radar-field skill are distinct targets. The practical implication is that sparse observations can provide useful local constraints, but their benefit for radar-like fields depends on the training loss, uncertainty representation, and how observation support is encoded in the model.

URL PDF HTML ☆

赞 0 踩 0

2606.18402 2026-06-18 eess.SP cs.AI cs.AR cs.SY eess.SY 新提交

Deep-Learning-Based Pixelated Microwave Filter Design and Characterization using Electro-Optical Electric-Field Measurements

基于深度学习的像素化微波滤波器设计与表征：利用电光电场测量

Han Zhou, Richard Bannister, Caspar Pierce, Haojie Chang, David Widen, Ludvig Fornstedt, Gabriel Melin, Alexander Bohlin, Pontus Lindeberg Fredriksson, Dilbagh Singh, Christian Fager, Koen Buisman

发表机构 * Chalmers University of Technology（查尔姆斯理工大学）； Advanced Technology Institute, University of Surrey（萨里大学先进科技研究所）； National Physical Laboratory（国家物理实验室）

AI总结提出结合卷积神经网络与遗传算法的深度学习方法，自动合成像素化微波滤波器，通过S参数和空间电场测量实验验证，实现7 GHz通带和9.5 GHz以上超过20 dB抑制，首次用电光测量揭示AI生成设计的电场模式。

2606.18395 2026-06-18 eess.SP cs.AI cs.AR cs.SY eess.SY 新提交

Deep Learning-Driven Inverse Design of Doherty Power Amplifiers Using Pixelated Combiners and Dual-State Impedance Synthesis

基于深度学习的Doherty功率放大器逆向设计：使用像素化合成器和双态阻抗合成

Han Zhou, Haojie Chang, David Widen, Christian Fager

发表机构 * Tampere University（塔尔皮奥大学）； Chalmers University of Technology（挑战者技术大学）

AI总结提出一种结合深度卷积神经网络、像素化布局和遗传算法的三端口Doherty合成器设计方法，实现峰值和回退功率条件下的双态阻抗合成，在2.6-2.8 GHz频段内饱和输出功率>44.2 dBm，峰值漏极效率>71.2%。

2606.18354 2026-06-18 eess.IV cs.LG 新提交

Structural MRI Synthesis for Alzheimer's Disease via Conditional Diffusion on Anatomical Masks

基于解剖掩膜条件扩散的阿尔茨海默病结构MRI合成

Muge Zhang, Muhammad Ali Khaliq, Jamal Alsakran, Byeong Kil Lee, Jeeho Ryoo

发表机构 * Fairleigh Dickinson University（Fairleigh Dickinson大学）； University of Colorado at Colorado Springs（科罗拉多州立大学）

AI总结针对阿尔茨海默病结构MRI合成中细微解剖变化难以捕捉的问题，本文扩展Med-DDPM条件扩散模型，以解剖分割掩膜为条件生成3D结构MRI，实验表明合成数据训练的模型Dice分数与真实数据相当，混合数据训练则显著提升性能。

详情

DOI: 10.1109/MIPR67560.2025.00037
Journal ref: 2025 IEEE 8th International Conference on Multimedia Information Processing and Retrieval (MIPR)

AI中文摘要

生成式机器学习模型的最新进展显著改善了医学成像，为数据增强、隐私保护和模型泛化提供了有前景的解决方案。然而，由于神经退行性病变相关的细微、区域特异性和渐进性解剖变化，合成阿尔茨海默病（AD）的高质量结构MRI数据仍然具有挑战性。在本文中，我们将最初为脑肿瘤合成设计的Med-DDPM条件扩散模型扩展，以生成专门针对AD的3D结构MRI。我们采用Med-DDPM，因为与其他生成模型相比，它具有稳定的结构和保真度，特别适合捕捉AD特征的细微解剖变化。我们的方法以来自ADNI数据集的解剖分割掩膜为条件，将关键的AD相关脑结构纳入生成过程。我们通过在真实、合成和混合数据集上训练分割模型，系统评估了合成图像的质量和实用性。实验结果表明，仅在合成数据上训练的分割模型达到了与真实数据训练（0.6513）相当的Dice分数（0.6532），同时召回率显著提高。值得注意的是，在混合数据集（混合真实和合成图像）上训练的模型优于真实和纯合成基线，Dice分数达到0.7244。这些发现强调了条件扩散模型在生成解剖准确、AD特异性合成MRI方面的成功应用，并突出了它们在增强训练数据可用性、提高诊断准确性和促进神经影像研究可重复性方面的潜力。

英文摘要

Recent advances in generative machine learning models have significantly improved medical imaging, offering promising solutions for data augmentation, privacy preservation, and improved model generalization. However, synthesizing high-quality structural MRI data for Alzheimer's Disease (AD) remains challenging due to the subtle, region-specific, and progressive anatomical changes associated with neurodegeneration. In this paper, we extend the Med-DDPM conditional diffusion model -- originally designed for brain tumor synthesis -- to generate 3D structural MRIs specifically tailored to AD. We adopted Med-DDPM due to its established stability and structural fidelity compared to other generative models, which makes it particularly suitable for capturing the subtle anatomical changes characteristic of AD. Our approach conditions the diffusion process on anatomical segmentation masks derived from the ADNI dataset, incorporating key AD-relevant brain structures into the generation process. We systematically evaluate the quality and utility of the synthetic images by training segmentation models on real, synthetic, and hybrid (mixed) datasets. Experimental results demonstrate that segmentation models trained exclusively on synthetic data achieve comparable Dice scores (0.6532) to those trained on real data (0.6513), while exhibiting significantly enhanced recall. Notably, models trained on hybrid datasets (mixing real and synthetic images) outperform both real and synthetic-only baselines, achieving a Dice score of 0.7244. These findings underscore the successful use of conditional diffusion models for generating anatomically accurate, AD-specific synthetic MRIs, and highlight their potential for enhancing training data availability, improving diagnostic accuracy, and promoting research reproducibility in neuroimaging studies.

URL PDF HTML ☆

赞 0 踩 0

2606.18302 2026-06-18 q-bio.OT cs.LG 新提交

Protein-Based Fish Species Identification: Dataset, Models, and Insights from Native Bangladeshi Fish

基于蛋白质的鱼类物种识别：孟加拉本土鱼类的数据集、模型与见解

Md Nasiat Hasan Fahim, Md. Abid Ullah Muhib, Mohammad Shahidur Rahman

发表机构 * Shahjalal University of Science

AI总结本研究构建了首个孟加拉本土鱼类蛋白质序列数据集，并系统评估了七种架构，提出了一种轻量级混合模型MotifCNN-Transformer+TA-PE，在资源受限场景下优于大型蛋白质语言模型ProtBERT。

Comments Published in 2026 IEEE 2nd International Conference on Quantum Photonics, Artificial Intelligence & Networking (QPAIN). \c{opyright} 2026 IEEE. Personal use of this material is permitted

详情

DOI: 10.1109/QPAIN69676.2026.11546620
Journal ref: 2026 IEEE 2nd International Conference on Quantum Photonics, Artificial Intelligence & Networking (QPAIN)

AI中文摘要

在孟加拉国，正确识别鱼类物种对于粮食安全、经济发展和气候适应性至关重要。蛋白质序列直接反映功能和进化约束，对物种认证和生物多样性监测具有重要意义。然而，目前尚无针对孟加拉本土鱼类物种的蛋白质序列识别基准。本研究通过引入首个包含9种孟加拉本土鱼类2845条高质量蛋白质序列的精选数据集来填补这一空白。我们还通过对七种架构范式进行系统基准测试，建立了该领域首个蛋白质序列分类基线。此外，我们提出了一种实用的新型混合架构——MotifCNN与具有末端感知位置编码的Transformer（MotifCNN-Transformer+TA-PE）。该新架构实现了79.80%的准确率和0.80的宏F1分数。最高准确率83.04%由微调的蛋白质语言模型ProtBERT取得，该模型有4.2亿参数，需要双16GB GPU进行推理。根据McNemar检验，ProtBERT相比我们的MotifCNN-Transformer+TA-PE的3.24%准确率提升在统计上不显著（p = 0.1120）。在九类中的六类上，我们的新架构在每类识别中优于ProtBERT。此外，我们的MotifCNN-Transformer+TA-PE比ProtBERT快约5倍，小42倍，支持16倍更大的批处理大小，且无需GPU推理，使其在资源受限地区（如孟加拉农村）部署更为实用。除此之外，我们的基础性工作展示了系统发育关系对序列相似性的影响，并为南亚蛋白质依赖型经济中的渔业管理、食品认证和生物多样性保护建立了途径。

英文摘要

Correct identification of fish species is highly significant for food security, economic development, and climate resilience in Bangladesh. Protein sequences directly reflect functional and evolutionary constraints which are important for species authentication and biodiversity monitoring. Yet there exists no benchmark for native Bangladeshi fish species identification from protein sequence. In this study, we addressed this gap by introducing the first curated dataset for nine native Bangladeshi fish species of 2845 high quality protein sequences. We also established the first protein sequence classification baseline for this domain through a systematic benchmarking of seven architectural paradigms. Moreover, we propose a realistic deployable novel hybrid architecture of MotifCNN and Transformer with Terminal-Aware Positional-Encoding (MotifCNN-Transformer+TA-PE). Our novel architecture achieves 79.80% accuracy with macro-F1 of 0.80. The highest 83.04% accuracy is achieved by finetuned protein language model ProtBERT that has 420M parameters and requires dual 16GB GPUs for inference. According to McNemar's test, ProtBERT's 3.24% accuracy gain over our MotifCNN-Transformer+TA-PE is statistically insignificant (p = 0.1120). Our novel architecture beats it among six of the nine classes in per class identification. Also our MotifCNN-Transformer+TA-PE is approximately 5x faster, 42x smaller, and supports 16x larger batch size than ProtBERT and has GPU free inference, making it more practical for deployment in resources constrained areas such as rural Bangladesh. Beyond this, our foundational work shows effects of phylogenetic relationships on sequence similarity and establishes pathways for fisheries management, food authentication and biodiversity conservation in South Asia's protein dependent economy.

URL PDF HTML ☆

赞 0 踩 0