URL PDF HTML ☆

赞 0 踩 0

2606.12673 2026-06-12 cs.LG cs.AI 新提交

A Zero-shot Generalized Graph Anomaly Detection Framework via Node Reconstruction

基于节点重构的零样本广义图异常检测框架

Phan Nguyen, Dat Cao, Hien Chu, Khue Hoang

发表机构 * School of Computing, KAIST（韩国科学技术院计算机学院）

AI总结提出AlignGAD框架，通过全局统一模块对齐异构特征、聚类模块捕获组级异常模式及节点差异评分模块聚合多视图异常证据，实现零样本跨域图异常检测。

详情

AI中文摘要

跨域图异常检测旨在识别未见过的目标图中的异常节点，在异构图数据的实际应用中展现出巨大潜力。然而，现有方法通常依赖于数据集特定的特征语义和结构模式，限制了其跨域泛化能力。为解决这一挑战，我们提出AlignGAD，一个零样本广义图异常检测框架。我们的框架基于三个关键组件：全局统一模块，用于对齐异构节点特征并在谱域中归一化图信号；聚类模块，用于构建聚类感知的图视图以捕获组级异常模式；以及节点差异评分模块，用于测量重构差异并聚合来自不同图视图的异常证据。在多个真实数据集上的实验证明了AlignGAD在零样本图异常检测设置下的有效性。

英文摘要

Cross-domain graph anomaly detection (GAD) aims to identify abnormal nodes in unseen target graphs, showing strong potential in real-world applications with heterogeneous graph data. However, existing methods often depend on dataset-specific feature semantics and structural patterns, which limits their ability to generalize across different domains. To address this challenge, we propose AlignGAD, a zero-shot generalized graph anomaly detection framework. Our framework is built upon three key components: a Global Unification Module that aligns heterogeneous node features and normalizes graph signals in the spectral domain; a Clustering Module that constructs cluster-aware graph views to capture group-level abnormal patterns; and a Node Discrepancy Scoring Module that measures reconstruction discrepancy and aggregates anomaly evidence from different graph views. Experiments on multiple real-world datasets demonstrate the effectiveness of AlignGAD under the zero-shot GAD setting.

URL PDF HTML ☆

赞 0 踩 0

2606.12662 2026-06-12 cs.SD cs.AI cs.LG 新提交

BASENet: Band-Adapted Speech Enhancement Network with Cross-Band Attention

BASENet: 基于频带自适应的跨频带注意力语音增强网络

Damien Martins Gomes, François Capman

发表机构 * Thales SIX GTS, FRANCE（泰雷兹SIX GTS公司，法国）

AI总结提出BASENet，通过Bark尺度划分频带并分配自适应容量编码器，结合跨频带注意力模块，以最少参数实现高PESQ和STOI，适用于资源受限设备。

详情

AI中文摘要

语音增强模型通常对所有频率采用统一容量，忽略了人类听觉的非均匀频谱分辨率。我们提出BASENet，一种频率自适应架构，将频谱划分为Bark尺度频带，并为每个频带分配基于临界频带密度的缩放容量编码器，自动为感知密集的低频分配更深的分支，为高频分配更轻的分支。跨频带注意力模块通过紧凑的频率池化表示以线性复杂度捕获跨频带的谐波依赖性。基于具有密集连接的倒残差块和卷积循环网络，BASENet在VoiceBank+DEMAND上以仅0.83M参数和7.3 G MACs达到3.55 PESQ和STOI~96%，是所有PESQ > 3.50方法中参数最少的。因果变体（3.44 PESQ）超过了几种非因果基线，证实了其在资源受限设备上实时流传输的适用性。

英文摘要

Speech enhancement models typically apply uniform capacity across all frequencies, disregarding the non-uniform spectral resolution of human hearing. We propose BASENet, a frequency-adapted architecture that partitions the spectrum into Bark-scale bands and assigns each a scaled-capacity encoder derived from critical-band density, automatically granting deeper branches to perceptually dense low frequencies and lighter ones to high frequencies. A cross-band attention module captures harmonic dependencies across bands through compact frequency-pooled representations at linear complexity. Built on inverted residual blocks with dense connectivity and a convolutional recurrent network, BASENet achieves 3.55 PESQ and STOI~96% on VoiceBank+DEMAND with only 0.83M parameters and 7.3 G~MACs, the fewest parameters among all methods with PESQ > 3.50. A causal variant (3.44 PESQ) surpasses several non-causal baselines, confirming suitability for real-time streaming on resource-constrained devices.

URL PDF HTML ☆

赞 0 踩 0

2606.12658 2026-06-12 cs.LG q-bio.QM stat.ML 新提交

TEDD：不稳定时间特征的鲁棒检测

Ricardo Ribeiro Pereira, Bruno Casal Laraña, Nádia Soares, Miguel Araújo

发表机构 * Feedzai

AI总结提出TEDD方法，利用回归模型检测导致时间分布变化的特征，无需参数调优，可扩展，能检测数值和类别特征的单变量及多变量漂移。

Comments 8 pages, 9 figures

详情

DOI: 10.1109/ICDMW51313.2020.00063

AI中文摘要

在处理真实世界的时间序列数据时，经常会遇到特征分布随时间变化的情况。在这种不稳定的数据上直接使用机器学习模型可能导致性能迅速下降，尤其是当新分布与训练时所见差异较大时。为了解决这个问题，自动识别随时间变化的特征至关重要。检测到这些特征后，数据科学家和其他从业者能够通过应用数据变换等方式缓解问题，部署更鲁棒的模型，使其在更长时间内保持高性能。本文描述了特征不应遭受的时间变化类型，并提出了TEDD技术，用于a) 识别数据集何时可能导致不稳定的机器学习模型，以及b) 自动检测哪些特征导致了这种不鲁棒性。为此，我们利用回归模型来突出哪些特征有助于良好预测实例的时间戳。我们将我们的方法与其他方法在真实和合成数据上进行比较，测试它们在所有简单变化模式上的检测能力。我们表明，我们的方法：检测所有类型的基本变化，包括数值和类别特征；能够检测多变量漂移；返回一个可比较的值来衡量每个特征的变化量；无需参数调优；并且在数据集的特征数量和实例数量上都具有可扩展性。

保持策略梯度主导：面向长程工具使用智能体的兄弟引导信用蒸馏

Tianyu Ding, Jianhong Xin, Juan Pablo De la Cruz Weinstein

发表机构 * Amazon Web Services（亚马逊云服务）

AI总结针对长程工具使用强化学习中轨迹级优势信号稀疏的问题，提出兄弟引导信用蒸馏（SGCD），通过动态采样成功与失败轨迹、外部LLM对比生成逐步信用参考，实现密集信用分配，在AppWorld和τ³-airline任务上显著提升性能。

Comments 13 pages, 4 figures, 7 tables. Submitted to EMNLP 2026 Industry Track

详情

AI中文摘要

长程工具使用强化学习可以从结果验证中学习，但其轨迹级优势被广播到许多推理、API和答案令牌上。自蒸馏通过重用策略自身的轨迹或特权教师承诺提供更密集的信号。然而，我们表明直接的令牌级自蒸馏会悄然破坏工具使用：它复述教师行为而不知道验证器奖励哪些动作，因此有用技能和有害捷径被一起放大。我们引入兄弟引导信用蒸馏（SGCD），它使用蒸馏进行信用分配而非作为竞争性的演员损失。动态采样产生混合的成功和失败的兄弟轨迹；外部LLM将其对比总结为训练时逐步信用参考；密集的教师/学生散度驱动信用重新分配；有界分离的信用权重重塑GRPO令牌优势。部署的学生看不到外部LLM、兄弟证据或预言机。在AppWorld和τ³-airline上，SGCD优于匹配的GRPO比较器：AppWorld上test_normal的TGC从42.9提升到45.6，test_challenge从24.7提升到27.0；τ³-airline的pass@1从0.583提升到0.602。

英文摘要

Long-horizon tool-use reinforcement learning can learn from outcome verification, but its trajectory-level advantage is broadcast across many reasoning, API, and answer tokens. Self-distillation promises a denser signal by reusing a policy's own rollouts or a privileged teacher. We show, however, that direct token-level self-distillation can silently destroy tool use: it rehearses teacher behavior without knowing which actions the verifier rewards, so useful skills and harmful shortcuts are amplified together. We introduce Sibling-Guided Credit Distillation (SGCD), which uses distillation for credit assignment rather than as a competing actor loss. Dynamic sampling produces mixed successful and failed sibling rollouts; an external LLM summarizes their contrast into a training-only stepwise credit reference; dense teacher/student divergence drives credit reassignment; and bounded detached credit weights reshape GRPO token advantages. The deployed student sees no external LLM, sibling evidence, or oracle. Across AppWorld and $τ^3$-airline, SGCD improves over matched GRPO comparators: AppWorld TGC $42.9 \to 45.6$ on test_normal and $24.7 \to 27.0$ on test_challenge, and $τ^3$-airline pass@1 $0.583 \to 0.602$.

URL PDF HTML ☆

赞 0 踩 0

2606.12633 2026-06-12 cs.CV cs.LG 新提交

ECA: Efficient Continual Alignment for Open-Ended Image-to-Text Generation

ECA：面向开放图像到文本生成的高效持续对齐

Jiangtao Kong, Peijun Zhao, Chun-Fu Chen, Youngwook Do, Shaohan Hu, Tianyi Zhou, Huajie Shao

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出ECA方法，通过混合查询模块、Fisher动态扩展和字典重放，实现无需旧数据的持续对齐，缓解灾难性遗忘，提升开放图像到文本生成的增量学习性能。

Comments Accepted at the 43rd International Conference on Machine Learning (ICML 2026)

详情

AI中文摘要

开放图像到文本生成（OpenITG）的增量学习（IL）使模型能够持续为新的图像生成准确、上下文相关的文本，同时保留先前获得的知识。与先前研究不同，本文处理了一个更实际的场景，其中视觉数据的主要类别随时间推移而演变。在此背景下，我们引入了持续对齐的新概念，它逐步调整预训练VLM中的对齐模块，以保持高质量的跨模态表示。基于这一思想，我们提出了高效持续对齐（ECA），一种用于OpenITG的无样本IL方法。关键挑战是使模型能够获取新的任务特定特征，同时最小化对已建立对齐的干扰，且无需访问先前任务的原始数据。为此，ECA采用了三种核心机制：混合查询（MoQ）模块，用于适应任务特定的查询令牌；Fisher动态扩展（FeDEx），基于Fisher信息矩阵（FIM）度量动态扩展模型结构；以及带有字典重放（DR）的嵌入字典，以保留过去的知识。为了评估ECA的性能，我们构建了四个新的IL OpenITG基准，更好地反映了现实场景。实验结果表明，与基线方法相比，ECA显著缓解了灾难性遗忘并提高了IL性能。代码和基准可在该https URL获取。

英文摘要

Incremental Learning (IL) for Open-ended Image-to-Text Generation (OpenITG) enables models to continuously generate accurate, contextually relevant text for new images while preserving previously acquired knowledge. Unlike prior studies, this paper addresses a more practical scenario in which the predominant category of visual data shifts over time as environments evolve. In this context, we introduce a new notion of continual alignment, which incrementally adapts the alignment module within pre-trained VLMs to preserve high-quality cross-modal representations. Based on this idea, we propose Efficient Continual Alignment (ECA), a novel exemplar-free IL approach for OpenITG. The key challenge is enabling the model to acquire new, task-specific features while minimizing interference with the established alignment without accessing raw data from previous tasks. To address this, ECA employs three core mechanisms: a Mixture of Query (MoQ) module that adapts task-specific query tokens, a Fisher Dynamic Expansion (FeDEx) that dynamically expands model structure based on a Fisher Information Matrix (FIM)-based metric, and an embedding dictionary with Dictionary Replay (DR) to retain past knowledge. To evaluate ECA's performance, we construct four new IL OpenITG benchmarks that better reflect real-world scenarios. Experimental results demonstrate that ECA significantly mitigates catastrophic forgetting and improves IL performance compared to baseline methods. Code and benchmarks are available at https://github.com/Snowball0823/ECA.

URL PDF HTML ☆

赞 1 踩 0

2606.12628 2026-06-12 cs.CV 新提交

Context-Aware Feature-Fusion for Co-occurring Object Detection in Autonomous Driving

面向自动驾驶中共现对象检测的上下文感知特征融合

Binay Kumar Singh, Niels Da Vitoria Lobo

发表机构 * Department of Computer Science, University of Central Florida（中佛罗里达大学计算机科学系）

AI总结提出上下文中心特征融合框架CCFF，通过局部上下文融合模块和全局上下文注意力模块分别处理小/遮挡对象与共现先验，提升共现对象检测性能，在Cityscapes和BDD100K上实现类别一致性策略0.973和0.969，小目标检测AP_S提升14.1%。

Comments 8 pages, 3 figures, CVPR 2026 Precognition Workshop

详情

AI中文摘要

自动驾驶中的目标检测需要精确定位以及对共现对象之间关系上下文的固有理解。在极其复杂的异构环境中，稀有类别、小尺度对象和频繁出现的对象对于标准目标检测框架来说难以处理。在本文中，我们提出了一种新颖的框架，称为上下文中心特征融合（CCFF），它利用两个基于注意力的模块：局部上下文融合模块（LCFM）使用RoI到RoI的自注意力机制来解决空间交互，主要考虑小且部分遮挡的对象；而全局上下文注意力模块（GCAM）通过将top-K RoI特征池化为全局上下文注意力标记来转换对象的共现先验，避免了像素级全局池化的计算开销。这种局部和以对象为中心的全局特征的融合产生了上下文化的嵌入，增强了分类结果和共现对象检测。我们的方法在两个数据集Cityscapes和BDD100K上进行了评估，在关系一致性上显示出显著改进，分别达到了0.973和0.969的类别级一致性策略（CCS）。此外，我们的方法在小目标检测（AP_S: 14.1%）上取得了实质性提升，并成功恢复了通常在大分布中丢失的稀有类别，如“火车”。我们的效率报告显示，该框架以0.2 FPS的开销实时处理图像。代码可在此https URL获取。

英文摘要

Object detection in autonomous driving requires precise localization and an inherent understanding of the relational context between co-occurring objects. In extremely complex heterogeneous environments rare classes, small-scale objects, and frequently appearing objects are difficult for standard object detection frameworks to handle. In this paper, we propose a novel framework called Context-Centric Feature Fusion (CCFF), which utilizes two attention-based modules, Local Context Fusion Module (LCFM) uses the RoI-to-RoI self-attention mechanism to resolve spatial interactions, mainly considering small and partially obscured objects, while Global Context Attention Module (GCAM) converts the co-occurrence of objects priors by pooling top-K RoI features into a global context attention token, avoiding the computational overhead of pixel-level global pooling. This fusion of local and object-centric global features yields contextualized embeddings that enhance classification results and co-occurring objects detection. Our method is evaluated on two datasets, Cityscapes and BDD100K which demonstrate significant improvement on relational consistency, achieving a Category-level Consistency Strategy (CCS) of 0.973 and 0.969, respectively. Furthermore, our approach produces substantial gains in small object detection (AP_S: 14.1%) and successfully recovers rare classes such as "Train" that are typically lost in large distributions. Our efficiency report shows that the framework processes images in real time with a 0.2 FPS overhead. The code is available at https://github.com/BinayKSingh/CCFF.

URL PDF HTML ☆

赞 0 踩 0

2606.12616 2026-06-12 cs.AI cs.CL 新提交

PersonaDrive: Human-Style Retrieval-Augmented VLA Agents for Closed-Loop Driving Simulation

PersonaDrive: 面向闭环驾驶模拟的人类风格检索增强VLA智能体

Mahmoud Srewa, Praneetsai Iddamsetty, Mohammad Abdullah Al Faruque, Salma Elmalaki

发表机构 * University of California, Irvine（加利福尼亚大学尔湾分校）

AI总结提出PersonaDrive流水线，通过检索风格指令下的人类驾驶演示来调节视觉-语言-动作（VLA）驾驶智能体，实现闭环模拟中多样化的非自车智能体行为，无需针对每种风格重新训练。

详情

AI中文摘要

闭环驾驶模拟器通常在其环境中填充行为大致相同的非自车交通智能体，这些智能体要么由基于规则的交通管理器生成，要么由训练为单一行为模式的学习模型生成。最近的工作通过观测数据上的事后标签或LLM推断的奖励权重引入风格变化，但这些信号充当了风格应奖励什么的代理，而不是明确要求以该风格驾驶的人类演示。我们提出了PersonaDrive，一个流水线，它根据从风格指令的人类驾驶数据集中检索到的演示来调节视觉-语言-动作（VLA）驾驶智能体，在该数据集中，参与者在驾驶员在环平台上以激进、中性和保守指令驾驶CARLA排行榜路线。该流水线包括三个阶段：(i) 使用组合的图像-文本相似度分数对每种风格的人类驾驶数据进行离线三元组挖掘；(ii) 训练一个轻量级检索头，将冻结的视觉特征与每个风格数据库上的小型控制编码器融合；(iii) 微调单个VLA主干，以在航点预测期间将检索到的上下文点视为上下文行为演示。在推理时，通过切换检索头查询的每个风格数据库，相同的主干可以适应任何风格，因此选择风格无需针对每种风格重新训练，同时为闭环模拟启用人类风格、风格多样的非自车智能体。在Bench2Drive上，PersonaDrive（无风格）的驾驶得分比SimLingo高4.6%，比HiP-AD高2.5%，在风格条件下，每种风格都获得最高驾驶得分，波动范围约2%（其最弱风格超过最强基线DMW 5.4%），而从保守指令到激进指令，平均速度和加速度分别提高18%和25%。

英文摘要

Closed-loop driving simulators typically populate their environments with non-ego traffic agents that behave largely the same way, produced either by rule-based traffic managers or by learned models trained toward a single behavioral mode. Recent work introduces style variation through post-hoc labels on observational data or LLM-inferred reward weights, but these signals act as proxies for what a style should reward rather than demonstrations of humans explicitly asked to drive in that style. We introduce PersonaDrive, a pipeline that conditions a vision-language-action (VLA) driving agent on retrieved demonstrations from a style-instructed human driving dataset, in which participants drive CARLA leaderboard routes under aggressive, neutral, and conservative instructions on a driver-in-the-loop rig. The pipeline has three stages: (i) offline triplet mining over per-style human driving data using a combined image-text similarity score; (ii) training a lightweight retrieval head that fuses frozen visual features with a small control encoder over per-style databases; and (iii) fine-tuning a single VLA backbone to treat retrieved context points as in-context behavioral demonstrations during waypoint prediction. At inference, the same backbone is conditioned on any style by swapping which per-style database the retrieval head queries, so selecting a style requires no per-style retraining while enabling human-style, style-diverse non-ego agents for closed-loop simulation. On Bench2Drive, PersonaDrive (no style) improves the driving score by 4.6% over SimLingo and 2.5% over HiP-AD, and under style conditioning attains the highest driving score in every style within a roughly 2% band (its weakest style surpassing the strongest baseline, DMW, by 5.4%), while average speed and acceleration rise by 18% and 25% from the conservative to the aggressive instruction.

URL PDF HTML ☆

赞 0 踩 0

2606.12615 2026-06-12 cs.LG 新提交

Towards Provably Fair Machine Learning: Bayesian Approaches For Consistent and Transparent Predictions

迈向可证明公平的机器学习：用于一致和透明预测的贝叶斯方法

Owen O'Neill, Fintan Costello

发表机构 * University College Dublin（都柏林大学学院）

AI总结提出公平贝叶斯分类器，通过强制确定性和统计一致性，在多个数据集上实现零一致性错误，同时保持准确性和多校准，解决少数群体因正则化导致的预测不一致问题。

详情

AI中文摘要

部署在高风险领域的机器学习分类器产生的预测质量在不同子组之间存在系统性差异。对于由多个特征交叉定义的细粒度子组，预测通常与观测数据不一致：模型输出与该子组可用的证据相矛盾。正则化通过将小子组合并到较大组中来改善整体性能，从而加剧了这一问题，对人口统计少数群体产生不成比例的影响。我们定义了一致性预测的两个要求：确定性（相同的个体获得相同的预测）和统计一致性（在显著性水平alpha下，我们不能拒绝子组预测来自为该子组推断的贝叶斯最优目标分布的假设）。从这些要求出发，我们推导出公平贝叶斯分类器，该分类器同时强制每个组和子组满足这两个要求，并在无法进行一致确定性预测时弃权。在三个基准数据集（Adult、COMPAS和Bank Marketing）上，标准分类器对相当一部分子组产生统计上不一致的预测。我们的分类器通过构造实现零一致性错误，同时在每个测试数据集上超过基线准确性和多校准。统计一致性为预测质量提供了原则性基础，对算法公平性有直接影响。少数群体人口不成比例地集中在小子组中，而正是在这些子组中频率论推断最不可靠；因此，解决这一推断问题是迈向公平ML的必要步骤。通过在数据支持的最细粒度上强制贝叶斯一致性，我们的分类器证明了在实践中可以实现具有原则性弃权的详尽子组公平性。

英文摘要

ML classifiers deployed in high-stakes domains produce predictions whose quality varies systematically across subgroups. For granular subgroups defined by intersections of multiple features, predictions are often inconsistent with the observed data: the model's outputs contradict the evidence available for that subgroup. This problem is exacerbated by regularisation, which improves aggregate performance by collapsing small subgroups into larger groups, disproportionately affecting demographic minorities. We define two requirements for consistent prediction: determinism (identical individuals receive identical predictions) and statistical consistency (we cannot reject, at significance level alpha, the hypothesis that the predictions for a subgroup were drawn from the Bayesian optimal target distribution inferred for that subgroup). From these requirements we derive the Fair Bayesian classifier, which enforces both across every group and subgroup simultaneously and abstains whenever no consistent deterministic prediction is possible. On three benchmark datasets (Adult, COMPAS, and Bank Marketing), standard classifiers produce statistically inconsistent predictions for a substantial proportion of subgroups. Our classifier achieves zero consistency error by construction while exceeding baseline accuracy and multicalibration on every dataset tested. Statistical consistency provides a principled foundation for prediction quality with direct implications for algorithmic fairness. Minority demographics are disproportionately concentrated in small subgroups, precisely where frequentist inference is least reliable; addressing this inference problem is therefore a necessary step toward fair ML. By enforcing Bayesian consistency at the finest resolution the data supports, the our classifier demonstrates that exhaustive subgroup fairness with principled abstention is achievable in practice.

URL PDF HTML ☆

赞 0 踩 0

2606.12614 2026-06-12 cs.RO 新提交

DARRMS -- An Efficient Algorithm for Dynamic Attention Radius in Resource-Constrained Multi-Agent Systems

DARRMS——资源受限多智能体系统中动态注意力半径的高效算法

Benjamin Alcorn, Eman Hammad

发表机构 * Texas A&M University（德克萨斯A&M大学）

AI总结提出DARRMS算法，通过优化注意力半径和决策，在资源受限下降低计算需求，提升协调性和可扩展性。

详情

AI中文摘要

多智能体系统是机器人、网络安全和自动驾驶规划等领域不可或缺的工具。这类系统通常面临计算资源约束，需要高效的轻量级算法。传统决策框架常假设理想条件（如完全可观测性和无限计算能力），这与现实挑战不符。本文提出一种新算法，在不显著牺牲其他性能指标的前提下，降低对计算资源的需求。智能体将可观测性限制在某个注意力半径内，从而有意识地忽略对行动规划可能不必要的环境部分。通过同时优化注意力半径和决策，我们的方法在不确定环境中增强了协调性和可扩展性。通过理论分析和实证验证，我们证明了自适应观测在资源受限系统中提升系统性能并维持稳健决策策略的有效性。

英文摘要

Multi-agent systems are integral tools for various domains such as robotics, cybersecurity, and autonomous vehicle planning. These types of systems often have constraints on the computational resources, leading to a need for efficient lightweight algorithms. Traditional decision making frameworks often assume ideal conditions, such as full observability and unlimited computational capacity, which do not align with real-world challenges. In this paper, we introduce a new algorithm that allows for reduced demand on computational resources without a large cost of other performance metrics. Agents will limit their observability to some attention radius, which intentionally allows them to ignore parts of the environment that might be unnecessary for action planning. By optimizing both the attention radius and decision-making, our approach enhances coordination and scalability in uncertain environments. Through both theoretical analysis and empirical validation, we demonstrate the effectiveness of adaptive observation in improving system performance and maintaining robust decision-making strategies in resource-constrained systems.

URL PDF HTML ☆

赞 0 踩 0

2606.12610 2026-06-12 cs.LG 新提交

The Mathematics of AI Winters: The mathematical Taxonomy of Paradigm Fragility in AI Winter

AI寒冬的数学：AI中范式脆弱性的数学分类

Miquel Noguer i Alonso, David Pacheco Aznar

发表机构 * AIFI ； Staq.io

AI总结本文提出AI寒冬的数学解释，通过感知机不可能性、神经网络训练复杂度、高维非参数估计率、梯度消失和统计学习理论等数学瓶颈，分析早期AI范式失败的原因，并关联后续突破。

Comments 33 pages, 1 figure

详情

AI中文摘要

人工智能研究中两个主要的资金减少和信心下降时期，通常被称为第一次和第二次AI寒冬，通常被解释为工程失败、商业失望和预期膨胀。本文提出一个补充论点：这些时期的主导范式也遇到了真正的形式障碍，包括表示、优化、计算复杂性、统计可学习性和高维近似的限制。贡献是综合性的而非档案性的。我们并不声称特定定理机械地导致了寒冬；相反，我们表明早期AI的几个核心失望与数学上精确的瓶颈相一致。我们通过Minsky和Papert的感知机不可能结果、Blum和Rivest建立的精确神经网络训练的计算复杂性困难、Stone的高维非参数估计的极小化极大率、Hochreiter以及Bengio及其合作者的梯度消失分析，以及Vapnik和Chervonenkis、Valiant、Blumer及其合作者传统的经典统计学习理论来分析这些瓶颈。然后我们将这些障碍与后来缓解（而非消除）它们的突破联系起来。

Pythagoras-Prover: 通过增强型Lean形式化推进高效形式化证明

Joshua Ong Jun Leang, Zheng Zhao, Mihaela Cătălina Stoian, Qiyuan Xu, Haonan Li, Wenda Li, Shay B. Cohen, Eleonora Giunchiglia

发表机构 * Imperial College London（伦敦帝国学院）； University of Edinburgh（爱丁堡大学）； Nanyang Technological University（南洋理工大学）； MBZUAI（穆罕默德·本·扎耶德人工智能大学）

AI总结提出Pythagoras-Prover系列，包括自回归和扩散模型，通过课程SFT、动态过滤和增强型Lean形式化（ALF）扩展验证数据，在MiniF2F-Test上以更少参数超越DeepSeek-Prover-V2。

Comments Pythagoras-Prover: Technical Report

详情

AI中文摘要

现代Lean定理证明器只有在大量训练和推理计算下才能取得强性能，部分原因是由于稀缺的验证证明数据和形式化证明搜索的长推理轨迹，使得监督微调（SFT）和采样成本高昂。我们介绍了Pythagoras-Prover，一个计算高效的开源Lean定理证明器系列，专为实际计算预算而构建。该系列涵盖两种生成范式：4B和32B参数的自回归模型，以及首个概念验证的基于扩散的证明器（4B），它在推理时迭代地精炼Lean证明。为了提高训练效率，我们构建了一个Lean验证的语料库，按易、中、难问题分层，用于课程SFT，使模型逐步从较短、较简单的证明过渡到较长、较难的证明。在SFT期间，动态证明推理过滤方案保留了信息丰富的证明轨迹，同时将每个实例保持在8k令牌的上下文预算内。我们还引入了增强型Lean形式化（ALF），它将稀缺的验证语料库扩展为形式化语句的变体，通过自蒸馏填充以提供额外训练信号，而无需正式验证每个变异实例。通过扰动已知问题同时保留其形式化特征，ALF减少了对任何语句表面形式的依赖。实验上，Pythagoras-Prover-4B在MiniF2F-Test上的pass@32（86.1% vs 82.4%）超过了DeepSeek-Prover-V2-671B，参数数量约为其1/167，而Pythagoras-Prover-32B在MiniF2F-Test上以93.0%的成绩创下了开源最先进水平，并在672个PutnamBench问题中解决了93个。我们发布了MiniF2F-ALF，一个经ALF变异的对污染敏感的基准，每个评估模型在该基准上的准确率均下降；在此基准上，我们的32B模型仍然最强，而4B模型匹配了先前最先进的Goedel-Prover-V2-32B。

英文摘要

Modern Lean theorem provers achieve strong performance only with substantial training and inference compute, driven in part by scarce verified proof data and the long reasoning traces of formal proof search, making both supervised fine-tuning (SFT) and sampling expensive. We introduce Pythagoras-Prover, a compute-efficient open-source family of Lean theorem provers built for practical compute budgets. The family spans two generation paradigms: autoregressive models at 4B and 32B parameters, and a first proof-of-concept diffusion-based prover (4B) that iteratively refines Lean proofs at inference time. For training efficiency, we build a Lean-verified corpus stratified into easy, medium, and hard problems for curriculum SFT, so models acquire proof skills progressively from shorter, simpler proofs to longer, harder ones. During SFT, a dynamic proof-reasoning filtering scheme preserves informative proof traces while keeping each instance within an 8k-token context budget. We also introduce Augmented Lean Formalisation (ALF), which expands scarce verified corpora into variants of formal statements, populated via self-distillation for extra training signal without formally verifying every mutated instance. By perturbing known problems while preserving their formal character, ALF reduces reliance on any statement's surface form. Empirically, Pythagoras-Prover-4B surpasses DeepSeek-Prover-V2-671B at pass@32 on MiniF2F-Test (86.1% vs 82.4%) with ~167x fewer parameters, while Pythagoras-Prover-32B sets the open-source state of the art at 93.0% on MiniF2F-Test and solves 93 of 672 PutnamBench problems. We release MiniF2F-ALF, an ALF-mutated contamination-sensitive benchmark on which every evaluated model loses accuracy; here our 32B remains strongest and our 4B matches the prior state of the art, Goedel-Prover-V2-32B.

URL PDF HTML ☆

赞 0 踩 0

2606.12590 2026-06-12 cs.CV cs.AI 新提交

Analyzing and Improving Fine-grained Preference Optimization in Medical LVLMs

分析与改进医学LVLMs中的细粒度偏好优化

Shayan Mohammadizadehsamakosh, Pritam Sarkar, Leonid Sigal, Ali Etemad, Elham Dolatabadi

发表机构 * York University（约克大学）； University of British Columbia（不列颠哥伦比亚大学）； Vector Institute（向量研究所）； Queen’s University（女王大学）

AI总结针对医学大视觉语言模型在事实一致性、视觉定位和临床对齐方面的不足，提出一种结合双向令牌级KL正则化和视觉对比定位目标的细粒度在线偏好优化框架，通过最小编辑模型输出构建偏好对，仅修正临床错误片段，显著提升诊断准确性。

详情

AI中文摘要

大型视觉语言模型（LVLMs）在医学影像任务中取得了强劲性能，但仍容易出现事实不一致、视觉定位差以及与临床有意义反馈对齐不足的问题。现有的后训练对齐方法，包括直接偏好优化（DPO）及其变体，在医学领域面临三个关键限制：（1）序列级奖励信号将临床关键令牌与通用填充文本等同对待；（2）依赖静态监督微调参考作为偏好响应引入了离策略分布偏移，将优化导向风格伪影而非临床正确性；（3）对齐目标缺乏明确的视觉定位约束，使模型对微妙但诊断决定性的病理特征不敏感。我们的方法利用双向令牌级KL正则化以及视觉对比定位目标，该目标将干净图像与病变破坏图像配对，以惩罚缺乏足够视觉证据生成的响应。这些组件共同构成了一个细粒度的在线对齐框架，通过最小编辑模型生成的输出来构建偏好对，仅修正临床错误片段，同时保留原始语言风格。在医学影像任务和临床文本生成基准上的大量实验验证了我们方法的有效性。

英文摘要

Large Vision-Language Models (LVLMs) have achieved strong performance across medical imaging tasks, yet they remain prone to factual inconsistencies, poor visual grounding, and misalignment with clinically meaningful feedback. Existing post-training alignment approaches, including Direct Preference Optimization (DPO) and its variants, face three critical limitations in the medical domain: (1) sequence-level reward signals treat clinically critical tokens identically to generic filler text; (2) reliance on static supervised fine-tuning references as preferred responses introduces an off-policy distribution shift, steering optimization toward stylistic artifacts over clinical correctness; and (3) alignment objectives lack explicit visual grounding constraints, leaving models insensitive to subtle yet diagnostically decisive pathological features. Our method leverages a bidirectional token-wise KL regularizer alongside a visual-contrastive grounding objective that pairs clean and lesion-corrupted images to penalize responses generated without adequate visual evidence. Together, these components form a fine-grained, on-policy alignment framework that constructs preference pairs by minimally editing model-generated outputs, correcting only clinically erroneous spans while preserving the original linguistic style. Extensive experiments across medical imaging tasks and clinical text generation benchmarks validate the effectiveness of our approach.

URL PDF HTML ☆

赞 1 踩 0

2606.12587 2026-06-12 cs.AI cs.HC 新提交

Strategic Decision Support for AI Agents

AI智能体的战略决策支持

Shayan Kiyani, Sima Noorani, George Pappas, Hamed Hassani

发表机构 * University of Pennsylvania（宾夕法尼亚大学）

AI总结针对AI智能体作为主要决策者时的可靠性问题，提出通过优化问题最小化支持使用并控制反事实遗漏支持误差的战略决策支持框架，并开发在线算法自适应阈值化支持分数。

详情

AI中文摘要

传统上，决策支持研究人类如何使用机器学习模型做出更好的决策。在现代智能体系统中，这种角色分工日益反转：AI智能体代表用户行动，而人类和工具成为围绕它们的支持机制。这种角色反转将可靠性问题推至前沿，因为智能体错误可能产生严重后果，且智能体行为必须始终与人类目标和约束保持一致。脱离经典的决策支持观点，我们在AI智能体作为核心行动者的设定下，重新审视其两个基本原则：寻求支持的成本-价值权衡以及不确定性量化的作用。我们提出了一个AI智能体战略决策支持框架，通过一个优化问题来最小化支持使用，同时控制一个反事实遗漏支持误差：即智能体在那些支持本可实质改善其输出的实例上单独行动的概率。在总体层面，我们证明最优策略是关于支持价值的阈值规则。基于这一结构，我们开发了一种在线算法，该算法自适应地阈值化这样的分数，并使用随机探索来控制遗漏支持误差，无需分布假设。我们进一步引入了一种即时校准方法，在线减少不必要的支持调用。我们将该框架实例化到多种场景中，包括信息收集、人机协作和工具使用，展示了每种场景如何通过相同的战略决策支持视角建模。跨这些场景的实验表明，我们的方法可靠地控制了目标误差，同时在实际中大幅减少了支持使用。

英文摘要

Traditionally, decision support studies how humans use machine learning models to make better decisions. In modern agentic systems, this division of roles is increasingly reversed: AI agents act on behalf of users, while humans and tools becomes support mechanisms around them. This role reversal brings reliability concerns to the forefront, since agentic errors can be consequential and agent behavior must remain aligned with human goals and constraints. Departing from the classical view of decision support, we revisit its two basic principles, the cost--value tradeoff of seeking support and the role of uncertainty quantification, in a setting where AI agents are the central actors. We propose a framework for strategic decision support for AI agents through an optimization problem that minimizes support usage subject to controlling a counterfactual missed-support error: the probability that the agent acts alone on instances where support would have materially improved its output. At the population level, we show that the optimal policy is a threshold rule on the value of support. Building on this structure, we develop an online algorithm that adaptively thresholds such a score and uses randomized exploration to control missed-support error without distributional assumptions. We further introduce a calibration-on-the-fly method that reduces unnecessary support calls online. We instantiate this framework across diverse scenarios, including information gathering, human--AI collaboration, and tool use, showing how each can be modeled through the same strategic decision-support lens. Experiments across these settings show that our method reliably controls the target error while substantially reducing support usage in practice.

URL PDF HTML ☆

赞 0 踩 0

AI 大模型

视觉与机器人

科学与医疗

Observable Patterns Are Not Explanations: A Causal-Geometric Analysis of Latent Reasoning Models

Forecasting Is Not Attribution: Localizing Decoder Bypass in Graph-Based Neural Marketing Mix Models

From AGI to ASI

How Useful is Causal Invariance for Domain Adaptation in Finite-Sample Settings?

Fed-FBD: Federated Functional Block Diversification for Isolation, Privacy, and Surgical Unlearning

A Zero-shot Generalized Graph Anomaly Detection Framework via Node Reconstruction

BASENet: Band-Adapted Speech Enhancement Network with Cross-Band Attention

Physics-Informed Neural Networks for Chemotherapy Pharmacokinetics: Benchmarking the Clinical Estimator and Exposing Parameter Identifiability

TrajGenAgent: A Hierarchical LLM Agent for Human Mobility Trajectory Generation

MentalMARBERT: Domain-Adaptive Pre-training and Two-Stage Fine-Tuning for Arabic Mental Health Disorders Detection

TEDD: Robust Detection of Unstable Temporal Features

Individual Control Barrier Functions-Guided Diffusion Model for Safe Offline Multi-Agent Reinforcement Learning

The Metric Picks the Winner: Evaluation Choice Flips Model Rankings for Drug-Response Prediction in Unseen Chemistry

CD-RCM: Generalizable Continuous-Depth Novel View Synthesis for Reflectance Confocal Microscopy

Keep Policy Gradient in Charge: Sibling-Guided Credit Distillation for Long-Horizon Tool-Use Agents

ECA: Efficient Continual Alignment for Open-Ended Image-to-Text Generation

Context-Aware Feature-Fusion for Co-occurring Object Detection in Autonomous Driving

PersonaDrive: Human-Style Retrieval-Augmented VLA Agents for Closed-Loop Driving Simulation

Towards Provably Fair Machine Learning: Bayesian Approaches For Consistent and Transparent Predictions

DARRMS -- An Efficient Algorithm for Dynamic Attention Radius in Resource-Constrained Multi-Agent Systems

The Mathematics of AI Winters: The mathematical Taxonomy of Paradigm Fragility in AI Winter

Viral Proteins Reveal Geometry of Protein Language Models

Shopping Reasoning Bench: An Expert-Authored Benchmark for Multi-Turn Conversational Shopping Assistants

EgoEngine: From Egocentric Human Videos to High-Fidelity Dexterous Robot Demonstrations

From Imitation to Alignment: Human-Preference Flow Policies for Long-Horizon Sidewalk Navigation

Dual-State Slot Attention: Decoupling Appearance and Identity for Video Object-Centric Learning

Emerging Flexible Designs for Geospatial Multimodal Foundation Models

Pythagoras-Prover: Advancing Efficient Formal Proving via Augmented Lean Formalisation

Analyzing and Improving Fine-grained Preference Optimization in Medical LVLMs

Strategic Decision Support for AI Agents