arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2306.13985 2026-05-27 stat.ML cs.AI cs.LG stat.ME

Robust Classification of High-Dimensional Data using Data-Adaptive Energy Distance

使用数据自适应能量距离的高维数据鲁棒分类

Jyotishka Ray Choudhury, Aytijhya Saha, Sarbojit Roy, Subhajit Dutta

发表机构 * Indian Statistical Institute , Kolkata, India（印度统计研究所，加尔各答，印度）； School of Industrial and Systems Engineering, Georgia Institute of Technology , Atlanta, USA（工业与系统工程学院，佐治亚理工学院，美国亚特兰大）； Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology , Saudi Arabia（计算机、电子和数学科学与工程系，国王阿卜杜勒·阿齐兹大学科学与技术学院，沙特阿拉伯）； Applied Statistics Unit, Indian Statistical Institute , Kolkata, India（应用统计部，印度统计研究所，加尔各答，印度）； Department of Mathematics and Statistics, Indian Institute of Technology Kanpur , India（数学与统计系，印度理工学院坎普尔分校，印度）

AI总结针对高维低样本量数据，提出无调参、无矩条件的鲁棒分类器，在渐近条件下实现完美分类，并通过模拟和真实数据验证其优势。

Comments Published at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), 2023

详情

DOI: 10.1007/978-3-031-43424-2_6
Journal ref: In: ECML PKDD 2023: Research Track. Lecture Notes in Computer Science, vol 14173. Springer, Cham (2023)

AI中文摘要

高维低样本量数据的分类在基因表达研究、癌症研究和医学成像等多种实际场景中构成挑战。本文开发并分析了一些专门为HDLSS数据设计的分类器。这些分类器无需调参且具有鲁棒性，即它们不依赖于底层数据分布的任何矩条件。研究表明，在相当一般的条件下，它们在HDLSS渐近框架下能实现完美分类。还研究了所提分类器的比较性能。我们的理论结果得到了广泛的模拟研究和真实数据分析的支持，这些分析表明所提出的分类技术相对于几种广泛认可的方法具有显著优势。

英文摘要

Classification of high-dimensional low sample size (HDLSS) data poses a challenge in a variety of real-world situations, such as gene expression studies, cancer research, and medical imaging. This article presents the development and analysis of some classifiers that are specifically designed for HDLSS data. These classifiers are free of tuning parameters and are robust, in the sense that they are devoid of any moment conditions of the underlying data distributions. It is shown that they yield perfect classification in the HDLSS asymptotic regime, under some fairly general conditions. The comparative performance of the proposed classifiers is also investigated. Our theoretical results are supported by extensive simulation studies and real data analysis, which demonstrate promising advantages of the proposed classification techniques over several widely recognized methods.

URL PDF HTML ☆

赞 0 踩 0

2502.06567 2026-05-27 stat.ML cs.LG

Membership Inference Risks in Quantized Models: A Theoretical and Empirical Study

量化模型中的成员推断风险：理论与实证研究

Eric Aubinais, Philippe Formont, Pablo Piantanida, Elisabeth Gassiat

发表机构 * Université Paris-Saclay, CNRS, Laboratoire de mathématiques d’Orsay, France（巴黎萨克雷大学，法国国家科学研究中心，奥赛数学实验室，法国）； Université Paris-Saclay, ILLS, MILA, ÉTS, Montreal, Canada（巴黎萨克雷大学，ILLs，MILA，ÉTS，加拿大蒙特利尔）； ILLS, MILA, CNRS, CentraleSupélec, Montreal, Canada（ILLs，MILA，法国国家科学研究中心，中央超导学院，加拿大蒙特利尔）

AI总结本文通过理论分析和实证方法，研究后训练量化对机器学习模型成员推断隐私风险的影响，并提出新的成员推断安全指标。

详情

Journal ref: AISTATS 2026

AI中文摘要

量化机器学习模型已被证明在降低内存和推理成本的同时，能够保持与原始模型相当的性能水平。在这项工作中，我们研究了量化过程对数据驱动模型隐私的影响，重点关注它们对成员推断攻击的脆弱性。成员推断安全（MIS）最近被提出，用于表征机器学习模型针对最强大（且可能未知）攻击的隐私性。然而，量化MIS在计算上似乎非常困难。在本文中，我们针对最小化经验损失的机器学习模型的后训练量化过程，提出了一种新的MIS指标。该新指标是此背景下MIS理论渐近分析的副产品。我们还提出了一种经验估计MIS指标的方法。使用合成数据集和真实世界数据（在药物发现背景下），我们证明了我们的方法在评估和排序不同量化器的MIS方面的有效性。

英文摘要

Quantizing machine learning models has demonstrated its effectiveness in lowering memory and inference costs while maintaining performance levels comparable to those of the original models. In this work, we investigate the impact of quantization procedures on privacy in data-driven models, focusing on their vulnerability to membership inference attacks. Membership Inference Security (MIS) has recently been proposed to characterize the privacy of machine learning models against the most powerful (and possibly unknown) attacks. However, quantifying MIS appears to be computationally very difficult. In this paper, we propose a new MIS indicator for post-training quantization procedures of machine learning models that minimizes an empirical loss. This new indicator is a byproduct of a theoretical asymptotic analysis of the MIS in this context. We also present a methodology for empirically estimating our MIS indicator. Using synthetic datasets and real-world data (in the context of drug discovery), we demonstrate the effectiveness of our approach in assessing and ranking the MIS of different quantizers.

URL PDF HTML ☆

赞 0 踩 0

2003.05746 2026-05-27 cs.LO cs.AI cs.DB

Querying and Repairing Inconsistent Prioritized Knowledge Bases: Complexity Analysis and Links with Abstract Argumentation

查询与修复不一致的优先知识库：复杂性分析与抽象论证的联系

Meghyn Bienvenu, Camille Bourgaux

发表机构 * CNRS & University of Bordeaux, France（法国国家科学研究中心与波尔多大学）； DI ENS, ENS, CNRS, PSL University & Inria, Paris, France（巴黎高等师范学院（ENS）、法国国家科学研究中心（CNRS）、巴黎萨克雷大学（PSL University）与法国国家信息与自动化研究所（Inria））

AI总结本文研究优先知识库中不一致性处理问题，定义了全局、帕累托和完成最优修复，分析了基于这些修复的查询蕴含、唯一最优修复存在性及枚举的数据复杂度，并揭示了最优修复与抽象论证框架扩展之间的关系。

Comments This is an extended version of a paper appearing at the 17th International Conference on Principles of Knowledge Representation and Reasoning (KR 2020). This version corrects the statement of Theorem 43 (missing hypothesis). 27 pages

详情

AI中文摘要

本文探讨了优先知识库（由本体、事实集和冲突事实间的优先关系组成）的不一致性处理问题。在数据库设置中，已研究了密切相关的场景，并定义了优先不一致数据库的三种不同最优修复概念（全局、帕累托和完成）。将这些全局、帕累托和完成最优修复概念迁移到我们的设置后，我们研究了核心推理任务的数据复杂度：基于最优修复的不一致性容忍语义下的查询蕴含、唯一最优修复的存在性以及所有最优修复的枚举。我们的结果为用常见DL-Lite方言表述的本体上这些任务的数据复杂度提供了近乎完整的图景。我们工作的第二个贡献是阐明了最优修复与（基于集合的）论证框架不同扩展概念之间的关系。在我们的结果中，我们展示了帕累托最优修复精确对应于稳定扩展（并且通常也对应于优先扩展），并提出了一种受基础扩展启发且具有良好计算特性的优先知识库新语义。我们的研究还产生了一些关于基于偏好的论证框架的独立兴趣结果。

英文摘要

In this paper, we explore the issue of inconsistency handling over prioritized knowledge bases (KBs), which consist of an ontology, a set of facts, and a priority relation between conflicting facts. In the database setting, a closely related scenario has been studied and led to the definition of three different notions of optimal repairs (global, Pareto, and completion) of a prioritized inconsistent database. After transferring the notions of globally-, Pareto- and completion-optimal repairs to our setting, we study the data complexity of the core reasoning tasks: query entailment under inconsistency-tolerant semantics based upon optimal repairs, existence of a unique optimal repair, and enumeration of all optimal repairs. Our results provide a nearly complete picture of the data complexity of these tasks for ontologies formulated in common DL-Lite dialects. The second contribution of our work is to clarify the relationship between optimal repairs and different notions of extensions for (set-based) argumentation frameworks. Among our results, we show that Pareto-optimal repairs correspond precisely to stable extensions (and often also to preferred extensions), and we propose a novel semantics for prioritized KBs which is inspired by grounded extensions and enjoys favourable computational properties. Our study also yields some results of independent interest concerning preference-based argumentation frameworks.

URL PDF HTML ☆

赞 0 踩 0

2605.27343 2026-05-27 cs.CV cs.LG

Towards Controllable Image Generation through Representation-Conditioned Diffusion Models

通过表示条件扩散模型实现可控图像生成

Nithesh Chandher Karthikeyan, Jonas Unger, Gabriel Eilertsen

AI总结本文提出利用预训练自监督模型的表示作为条件，通过扩散模型实现无需大量标注的可控图像生成，并探索了表示空间中的平滑和分离特性。

2605.27130 2026-05-27 cs.LG cs.AI

DEI: Diversity in Evolutionary Inference for Quality-Diversity Search

DEI：质量-多样性搜索中的进化推理多样性

John Donaghy, Shikhar Rastogi

AI总结提出DEI框架，通过异构大语言模型作为变异算子进行分布式质量-多样性搜索，实验表明模型多样性比并行性更能提升搜索性能。

Comments Accepted to ICML 2026 Workshop Scalable Learning and Optimization for Efficient Multimodal AI Agents (SCALE)

详情

AI中文摘要

我们提出DEI：进化推理中的多样性，一个分布式质量-多样性（QD）搜索框架，该框架将异构大语言模型（LLM）分配为变异算子，在通过非阻塞集合操作通信的对等节点间运行。与同质并行搜索（在所有工作节点上复制单一模型的归纳偏差）不同，DEI将每个LLM独特的创造性先验视为行为新颖性的互补来源。通过DEI扩展数字红皇后框架，节点在每轮结束时共享局部最优解，以播种下一轮的种群。这产生了跨模型的对抗压力，推动了超越模型内自博弈的鲁棒性。在Core War领域（一个竞争性编程基准，其中Redcode战士程序在模拟机器中战斗）上评估，一个四节点异构集成（GPT-5.4-mini、Claude Sonnet 4.6、GPT-5.2和Claude Haiku 4.5）在相等的总LLM调用预算下，相比单节点基线，实现了124%更高的合并存档QD分数（45.90 vs. 20.46）和28%更高的覆盖率（80.6% vs. 63.0%的单元格）。异构集成还在QD分数、覆盖率和所有四个模型家族的保留解泛化性上优于同等预算的同质集成。这些结果首次提供了经验证据，表明模型多样性（而非仅仅是并行性）是分布式基于LLM的QD搜索中增益的关键驱动因素。

英文摘要

We present DEI: Diversity in Evolutionary Inference, a distributed Quality-Diversity (QD) search framework that assigns heterogeneous large language models (LLMs) as mutation operators across peer nodes communicating with non-blocking collective operations. Unlike homogeneous parallel search, which replicates a single model's inductive biases across all workers, DEI treats each LLM's distinct creative prior as a complementary source of behavioral novelty. Extending the Digital Red Queen framework with DEI, nodes share local optimal solutions at the end of each round to seed the next round's population. This creates cross-model adversarial pressure that drives robustness beyond intra-model self-play. Evaluated on the Core War domain, a competitive programming benchmark in which Redcode warrior programs battle inside a simulated machine, a four-node heterogeneous ensemble (GPT-5.4-mini, Claude Sonnet 4.6, GPT-5.2, and Claude Haiku 4.5) achieves 124 percent higher merged-archive QD-Score (45.90 vs. 20.46) and 28 percent higher coverage (80.6 percent vs. 63.0 percent of cells) than a single-node baseline at equal total LLM-call budget. The heterogeneous ensemble also outperforms an equally-budgeted homogeneous ensemble on QD-Score, coverage, and held-out solution generality across all four model families. These results provide the first empirical evidence that model diversity, not merely parallelism, is the key driver of gain in distributed LLM-based QD search.

URL PDF HTML ☆

赞 0 踩 0

2605.27072 2026-05-27 cs.CL cs.AI

E3: Issue-Level Backtesting for Automated Research Critique

E3: 面向自动化研究评论的问题级回测

Yashwardhan Chaudhuri, Sanyam Jain, Paridhi Mundra

AI总结提出E3自动化评论助手，通过问题级回测协议评估其在识别研究论文技术问题上的表现，相比人类评审和LLM基线实现最高召回率。

详情

AI中文摘要

我们提出E3，一个自动化评论助手，通过识别研究论文中与决策相关的技术问题来增强评审者和工程团队。对于每个问题，E3报告其性质、位置、对贡献的影响以及解决该问题所需的分析或证据，涵盖无根据的主张、缺失的消融实验、弱基线、隐藏假设、有效性威胁和数据泄露风险。为了在没有污染混杂因素的情况下评估E3，我们采用问题级回测协议：语料库仅限于每个自动化来源训练截止日期之后发表的论文，并且对于每篇论文，一个仅观察匿名评审的元裁判将每个问题-来源对标记为“捕获”、“部分”或“遗漏”。应用于100篇ICLR 2026论文和4598个被评判的问题行，将E3与ICLR人类评审以及基于OpenAI的gpt-5.4和Anthropic的claude-opus-4-6构建的两个提示匹配的LLM基线进行比较，使用元裁判gpt-5.5，E3在每个聚合指标上达到最高召回率。包含部分的召回率达到90.2%，比GPT高15.5个百分点，比Claude高17.1个百分点，比人类评审高29.2个百分点，严格召回率保持顺序为65.8%。在人类评审提出的问题上，E3恢复了89.6%；在人类评审遗漏的问题上，它额外发现了1635行被纳入评判联合集，比次优来源多406行。语料库、基线提示、裁判提示模板和评估代码已发布。

英文摘要

We present E3, an automated review assistant that augments reviewers and engineering teams by identifying decision-relevant technical concerns in research papers. For each concern, E3 reports its nature, its location, its bearing on the contribution, and the analysis or evidence that would resolve it, covering unsupported claims, missing ablations, weak baselines, hidden assumptions, threats to validity, and leakage risks. To evaluate E3 without contamination confounds we adopt an issue-level backtesting protocol: the corpus is restricted to papers postdating the training cutoff of every automated source, and for each paper a meta-judge that observes only anonymised reviews labels every issue-source pair as Caught, Partial, or Missed. Applied to 100 ICLR 2026 papers and 4598 judged issue rows, comparing E3 against the ICLR human reviews and two prompt-matched LLM baselines built on gpt-5.4 from OpenAI and claude-opus-4-6 from Anthropic, with meta-judge gpt-5.5, E3 attains the highest recall on every aggregate metric. Partial-inclusive recall reaches 90.2 percent, which is 15.5 points over GPT, 17.1 points over Claude, and 29.2 points over the human reviews, and strict recall preserves the ordering at 65.8 percent. On concerns raised by the human reviewers, E3 recovers 89.6 percent; on concerns the human reviewers missed it surfaces 1635 additional rows admitted into the judged union, 406 above the next-best source. Corpus, baseline prompts, judge prompt template, and evaluation code are released.

URL PDF HTML ☆

赞 0 踩 0

2605.26956 2026-05-27 cs.AI cs.CL

LELA: An End-to-end LLM-based Entity Linking Framework with Zero-shot Domain Adaptation

LELA: 一种基于LLM的端到端实体链接框架，支持零样本领域自适应

Samy Haffoudhi, Nikola Dobričić, Fabian Suchanek, Nils Holzenberger

AI总结本文提出LELA，一种基于大语言模型的模块化、领域无关的实体消歧方法，并扩展为实用的Python库，集成零样本命名实体识别，实现端到端实体链接，实验验证其跨领域性能与鲁棒性。

2605.26936 2026-05-27 cs.RO

A Bioinspired Underwater Robot with a Latch-Mediated Soft Bistable Mechanism

一种具有闩锁介导的软体双稳态机构的仿生水下机器人

Chongze Bi, Wenjie Wu, Zonghao Zuo, Li Wen

AI总结本文提出一种受生物启发的软体双稳态执行器，通过集成闩锁机构实现单电机驱动的非对称能量输入与释放，结合鳍结构实现高效水下推进与机动，实验验证了稳定拍动、精确转向及多模式运动能力。

Comments 6 pages, 6 figures

详情

AI中文摘要

近年来，水下机器人技术取得了显著进展。然而，微型水下机器人的发展仍受限于传统能源的低能量密度。自然界提供了引人注目的解决方案——像螳螂虾和跳蚤这样的生物利用闩锁介导的弹簧驱动（LaMSA）系统，通过解耦的能量存储和释放机制实现快速运动。尽管对LaMSA进行了广泛研究，但在简单紧凑的结构中复制这种快速、非对称驱动仍然具有挑战性。在这项工作中，我们介绍了一种受生物启发的软体双稳态执行器，它集成了闩锁机制，能够使用单个电机实现非对称的能量输入和释放。结合鳍结构，这种设计促进了高效的水下推进和机动性。实验结果表明，该机器人实现了稳定的周期性拍动、精确的转向，以及最大推力0.528 N、冲量0.147 Ns和垂直位移30 mm。通过调节鳍角，机器人实现了多种运动，包括垂直上升、斜向前进和横向平移。这项研究为控制紧凑型水下机器人的运动提供了一种新颖、节能的方法，为先进仿生设计在探索、环境监测和检查中的潜在应用铺平了道路。

英文摘要

Underwater robotics has advanced significantly over recent decades. however, the development of miniaturized underwater robots remains limited by low energy densities of traditional power sources. Nature offers compelling solutions-organisms like mantis shrimps and fleas utilize latch-mediated spring actuation (LaMSA) systems that achieve rapid movements through a decoupled energy storage and release mechanism. Despite extensive studies of LaMSA, replicating such rapid, asymmetric actuation within simple, compact structures remains challenging. In this work, we introduce a bioinspired, soft bistable actuator with an integrated latch mechanism that enables asymmetric energy input and release using a single motor. Coupled with fin structures, this design facilitates efficient underwater propulsion and maneuverability. Experimental results demonstrate stable periodic flapping, precise steering, and a maximum thrust of 0.528 N, impulse of 0.147 Ns, and vertical displacement of 30 mm. By modulating fin angles, the robot achieves versatile motions, including vertical ascent, diagonal forward movement, and lateral translation. This study presents a novel, energy-efficient approach for controlling motion in compact underwater robots, paving the way for advanced biomimetic designs with potential applications in exploration, environmental monitoring, and inspection.

URL PDF HTML ☆

赞 0 踩 0

2605.26715 2026-05-27 cs.LG

Image Feature Fusion-based Federated Client Unlearning (FCU)

基于图像特征融合的联邦客户端遗忘 (FCU)

Hangyi Shen, Yizhi Pan, Tiansuo Li, Weiqi Jiang, Guanqun Sun

AI总结针对联邦遗忘中灾难性遗忘导致全局泛化下降的问题，提出基于线性图像特征融合机制（Mixup）的联邦客户端遗忘方法，通过动态生成混合样本弥合遗忘与保留分布，在医学影像基准上实现了与重训练标准相当的遗忘效果。

详情

AI中文摘要

主要数据保护法规都提到了“被遗忘权”，这推动了联邦遗忘技术的发展。但一个顽固的问题仍然存在：灾难性遗忘——你擦除了目标知识，但同时也丢弃了必要的保留知识，从而损害了模型的全局泛化能力。为了在遗忘效果和泛化能力之间取得更好的平衡，我们提出了基于图像特征融合的联邦客户端遗忘（IFF-FCU）。其思想是引入线性图像特征融合机制（Mixup），动态创建混合样本，弥合遗忘分布和保留分布之间的差距。该策略不仅仅是删除几个离散的数据点——它在理论上拓宽并正则化了遗忘边界。我们在医学影像基准（RSNA-ICH 和 ISIC2018）上进行了大量实验，结果表明我们的方法实现了相当好的遗忘效果。例如，在 ICH 数据集上，IFF-FCU 实现了与重训练黄金标准高度竞争的误差偏差，显示出对现有基线的稳健改进。

英文摘要

Major data protection regulations all mention the "right to be forgotten," and that's what pushed federated unlearning (FU) techniques forward. But one stubborn issue remains: catastrophic forgetting--you erase the target knowledge, yet somehow you also end up throwing out essential retained knowledge, which then hurts the model's global generalization. To get a better balance between unlearning effectiveness and generalization ability, we propose something called Image Feature Fusion-based Federated Client Unlearning (IFF-FCU). The idea is to bring in a linear Image Feature Fusion mechanism (Mixup) that dynamically creates mixed samples, bridging the gap between forget-distribution and retain-distribution. What this strategy does isn't just deleting a few discrete data points--it theoretically widens and regularizes the forgetting boundary. We ran extensive experiments on medical imaging benchmarks (RSNA-ICH and ISIC2018), and the results show that our approach achieves reasonably good unlearning. For instance, on the ICH dataset, IFF-FCU achieves a highly competitive Error deviation from the retrained gold standard, demonstrating robust improvements over existing baselines.

URL PDF HTML ☆

赞 0 踩 0

2605.22511 2026-05-27 cs.AI cs.CL cs.IR

Search-E1: Self-Distillation Drives Self-Evolution in Search-Augmented Reasoning

Search-E1: 自蒸馏驱动搜索增强推理中的自我进化

Zihan Liang, Yufei Ma, Ben Chen, Zhipeng Qian, Xuxin Zhang, Huangyu Dai, Lingtao Mao

AI总结提出Search-E1方法，通过交替使用普通GRPO和在线策略自蒸馏（OPSD），让搜索增强智能体无需外部监督或复杂模块即可自我进化，在七个QA基准上以3B模型超越所有开源基线。

详情

AI中文摘要

后训练已成为将语言模型转变为胜任的搜索增强推理智能体的主要方法。近期一系列工作通过在此标准流程之上添加复杂机制来进一步提升性能。这些增强引入了来自更强外部系统的外部监督，附加了诸如过程奖励模型或回顾性评论者等辅助模块，通过树搜索或多阶段课程重构了轨迹生成本身，并利用手工设计的奖励和惩罚来塑造奖励。每项增加都带来了可衡量的提升，但同时也使训练流程更加复杂，并将方法绑定到可能并非总是可用的资源或设计上。我们退一步思考这些机制是否真的必要，并提出了Search-E1，一种自我进化方法，让搜索增强智能体仅通过普通的GRPO与在线策略自蒸馏（OPSD）交替进行来改进。在每轮GRPO之后，策略在其自身的训练问题上进行轨迹生成。然后，一个token级的前向KL目标将策略的推理时分布与其在特权上下文下的自身分布对齐，该特权上下文暴露了更高效的兄弟轨迹。尽管简单，该过程自然地提供了密集的每步监督。在七个QA基准上，Search-E1使用Qwen2.5-3B达到了0.440的平均EM，在两个规模上均超越了所有开源基线。代码和完整版本将很快公开。

英文摘要

Post-training has become the dominant recipe for turning a language model into a competent search-augmented reasoning agent. A line of recent work pushes its performance further by adding elaborate machinery on top of this standard pipeline. These augmentations import external supervision from stronger external systems, attach auxiliary modules such as process reward models or retrospective critics, restructure the rollout itself with tree search or multi-stage curricula, or shape the reward with hand-crafted bonuses and penalties. Each addition delivers a measurable gain, but each also inflates the training pipeline and ties the recipe to resources or designs that may not always be available. We take a step back and ask whether any of this machinery is actually necessary, and propose Search-E1, a self-evolution method that lets a search-augmented agent improve through only vanilla GRPO interleaved with on-policy self-distillation (OPSD). After each GRPO round, the policy rolls out on its own training questions. A token-level forward KL objective then aligns the policy's inference-time distribution to its own distribution under a privileged context that exposes a more efficient sibling trajectory. Despite this simplicity, the procedure naturally provides dense per-step supervision. On seven QA benchmarks, Search-E1 reaches 0.440 average EM with Qwen2.5-3B, surpassing all open-source baselines at both scales. Code and complete version will be made public soon.

URL PDF HTML ☆

赞 0 踩 0

2605.26476 2026-05-27 cs.CL cs.IR

FAB-Bench: A Framework for Adaptive RAG Benchmarking in Semiconductor Manufacturing

FAB-Bench：半导体制造中自适应RAG基准测试框架

Jingbin Qian, Congwen Yi, Min Xia, Wen Wu, Jun Zhu, Jian Guan

AI总结提出FAB-Bench框架，通过六项诊断指标和三种合成策略，评估半导体制造领域RAG系统在不同上下文窗口下的性能，发现注意力稀释是极端上下文长度下性能下降的主要原因。

详情

AI中文摘要

检索增强生成（RAG）已成为知识密集型应用的关键技术，然而在垂直领域评估其性能仍然困难，原因包括领域复杂性、多样的上下文规模以及对专家评估的严重依赖，而专家评估成本高、不一致且不可扩展。我们提出了FAB-Bench，一个用于半导体制造中RAG系统自适应基准测试的端到端框架。FAB-Bench定义了六项诊断指标，衡量事实准确性、上下文利用率、完整性、检索相关性、技术深度和推理一致性。该框架将检索器诊断与生成器级别的推理分析相结合，覆盖4K-32K token的上下文窗口，量化了随着上下文范围扩展，检索精度和生成保真度如何共同演变。从超过1300个生成的候选对中，我们精选了200个查询-答案对的高质量基准，涵盖三种合成策略：大海捞针、文档内多主题和跨文档多跳。在四个LLM和四个RAG框架上的系统评估揭示了三种不同的上下文缩放行为：对数增长、早期饱和和冷启动动态，并确定注意力稀释是极端上下文长度下性能下降的主要机制。在另外三个生产级RAG系统上的跨框架验证确认了评估的可移植性。

英文摘要

Retrieval-Augmented Generation (RAG) has become critical for knowledge-intensive applications, yet evaluating its performance in vertical domains remains difficult due to domain complexity, diverse context scales, and heavy reliance on expert assessments that are costly, inconsistent, and non-scalable. We introduce FAB-Bench, an end-to-end framework for adaptive benchmarking of RAG systems in semiconductor manufacturing. FAB-Bench defines six diagnostic metrics measuring factual accuracy, contextual utilization, completeness, retrieval relevance, technical depth, and reasoning consistency. The framework couples retriever diagnostics with generator-level reasoning analysis across context windows of 4K-32K tokens, quantifying how retrieval precision and generative fidelity co-evolve as contextual scope expands. From over 1,300 generated candidates, we curated a high-quality benchmark of 200 query-answer pairs spanning three synthesis strategies: needle-in-haystack, intra-document multi-topic, and cross-document multi-hop. Systematic evaluation across four LLMs and four RAG frameworks reveals three distinct context-scaling behaviors: logarithmic growth, early saturation, and cold-start dynamics, and identifies attention dilution as the primary mechanism behind performance degradation at extreme context lengths. Cross-framework validation on three additional production RAG systems confirms evaluation portability.

URL PDF HTML ☆

赞 0 踩 0

2605.26414 2026-05-27 cs.AI cs.CL cs.LG

Reasoning, Code, or Both? How Large Language Models Handle Variations in Math Questions

推理、代码，还是两者兼有？大型语言模型如何处理数学问题的变化

Matthew Kutakh

AI总结本研究通过对比链式思维推理、单次代码执行和迭代代码执行三种方法在GSM-Symbolic数据集上的表现，发现代码执行并未提升大型语言模型在数学问题变体上的推理鲁棒性。

Comments 6 pages, 4 figures, 2 tables

详情

AI中文摘要

大型语言模型（LLMs）在数学推理基准测试中取得了令人印象深刻的准确性，但当问题被修改为不同的名字或数字等简单变化时，它们的性能会下降。代码执行方法允许模型生成并运行Python代码，而不是用自然语言进行推理，已被提出作为解决方案，但其对推理鲁棒性（即在问题变体中保持准确性的能力）的影响尚未得到系统测试。本研究在GSM-Symbolic数据集的1000个问题上评估了三种方法：使用链式思维（CoT）提示的纯推理、使用程序辅助语言模型（PAL）的单次代码执行，以及使用逐步编码（SBSC）的迭代代码执行。所有三种方法均在配对的原始问题和修改问题上使用Claude Haiku 4.5运行。CoT是最鲁棒的方法，在扰动下准确率下降1.3个百分点，1.8%的问题被破坏。PAL的鲁棒性最差，准确率下降1.7个百分点，3.1%的问题被破坏，SBSC介于两者之间。尽管这些差异在统计上不显著（$p = .096$），但方向趋势在所有指标上一致，表明无论是单次还是迭代的代码执行，都没有提高小学水平问题变体的推理鲁棒性。

英文摘要

Large Language Models (LLMs) achieve impressive accuracy on mathematical reasoning benchmarks, yet their performance drops when problems are modified with simple changes like different names or numbers. Code execution methods, which let models generate and run Python code instead of reasoning in natural language, have been proposed as a solution, but their effect on reasoning robustness (the ability to maintain accuracy across problem variations) has not been systematically tested. This study evaluates three approaches on 1,000 problems from the GSM-Symbolic dataset: pure reasoning using chain-of-thought (CoT) prompting, single-shot code execution using Program-Aided Language models (PAL), and iterative code execution using Step-by-Step Coding (SBSC). All three were run on paired original and modified problems using Claude Haiku 4.5. CoT was the most robust method, with an accuracy drop of 1.3 percentage points and 1.8% of problems breaking under perturbation. PAL was the least robust at 1.7 percentage points and 3.1% broke, with SBSC falling in between. Although these differences were not statistically significant ($p = .096$), the directional trend was consistent across all measures, suggesting that code execution, whether single-shot or iterative, does not improve reasoning robustness on grade-school-level problem variations.

URL PDF HTML ☆

赞 0 踩 0

2605.26279 2026-05-27 cs.AI cs.CE

Constraint acquisition needs better benchmarks

约束获取需要更好的基准测试

Rafał Stachowiak, Tomasz P. Pawlak

AI总结针对约束获取（CA）和数学规划（MP）模型验证与增强研究缺乏合适基准的问题，提出MPMMine基准套件，通过统一结构、开放格式和多样化数据支持算法评估。

Comments 12 pages, 1 figure, for the associated dataset, see https://github.com/MPMMine/MPMMine

详情

AI中文摘要

约束获取（CA）及基于领域知识工件对数学规划（MP）模型进行验证和增强的相关研究，目前因缺乏合适的基准而受到限制。这一缺陷阻碍了可重复性和跨研究可比性，减缓了CA方法的成熟。现有基准是为求解器评估而非CA算法评估而设计的。它们组织松散，对单个问题的处理不一致，并且省略了CA方法所需的领域知识工件。本工作提出了MPMMine，一个旨在评估使用多样化领域知识工件发现、验证和增强MP模型的算法的基准套件。MPMMine以一致性、标准化、完整性、可扩展性、开放性和版本控制为指导。它采用统一结构并依赖开放格式：MiniZinc、CommonMark和JSON。它为每个问题提供多个模型，每个模型提供数十个实例，以及整数和连续域中的数千个解和非解，同时附带自然语言描述以支持文本到模型方法。

英文摘要

Constraint Acquisition (CA) and related research on the validation and enhancement of Mathematical Programming (MP) models from domain knowledge artifacts are currently limited by inadequate benchmarks. This deficiency impedes reproducibility and cross-study comparability, slowing the maturation of CA methods. Existing benchmarks were designed for solver evaluation rather than for assessing CA algorithms. They are loosely organized, treat individual problems inconsistently, and omit the domain knowledge artifacts required by CA methods. This work presents MPMMine, a benchmark suite designed to assess algorithms that discover, validate, and enhance MP models using diverse domain knowledge artifacts. MPMMine is guided by consistency, standardization, completeness, extensibility, openness, and version control. It adopts a uniform structure and relies on open formats: MiniZinc, CommonMark, and JSON. It provides multiple models per problem, tens of instances per model, and thousands of solutions and non-solutions in both integer and continuous domains, alongside natural-language descriptions to support text-to-model methods.

URL PDF HTML ☆

赞 0 踩 0

2605.26171 2026-05-27 cs.LG

When Rule Violations Are Rare: Chimera Training for Logical Anomaly Detection

当规则违反罕见时：用于逻辑异常检测的嵌合体训练

Alejandro Ascarate, Leo Lebrat, Rodrigo Santa Cruz, Clinton Fookes, Olivier Salvado

AI总结针对规则违反样本稀少的逻辑异常检测，提出嵌合体训练方法，通过特征级操作数反事实构造生成监督信号，提升规则级异常检测性能。

Comments 9+30 pages, 4+4 figures, under review

详情

AI中文摘要

许多实际异常不仅仅是罕见的输入，而是语义约束的违反：对象以结构化方式共现，动作蕴含前提条件，事件满足时间或关系规律。我们研究这种设置下的异常检测，其中约束以学习到的视觉概念上的逻辑规则形式给出，但训练期间真实规则违反罕见或缺失。我们提出一种神经规则评估器，将每个约束编译成有向无环图，并为其内部逻辑运算符学习特征感知的子树MLP门。每个门将子特征和边级否定映射到父表示和规则满足概率，并通过基于真实概念标签的精确布尔传播获得中间监督。关键困难在于同图像训练数据通常无法提供信息性真值配置的充分覆盖，并允许捷径解。为解决此问题，我们引入嵌合体训练：在特征级别进行操作数级反事实构造。我们不混合输入图像，而是连接来自不同样本的子树特征；每个操作数保留其来源样本的硬真值标签，并通过将节点的逻辑运算符应用于这些继承标签来获得嵌合体目标。这提供了监督逻辑反例，而无需真实异常图像。在CLEVRER、OpenImages和VidOR上，所得到的评估器在规则级异常AUROC上优于独立事件和同图像语义训练基线，特别是对于组合和关系规则。该方法产生标量异常分数和规则级归因。

英文摘要

Many practical anomalies are not merely rare inputs, but violations of semantic constraints: objects co-occur in structured ways, actions imply preconditions, and events satisfy temporal or relational regularities. We study anomaly detection in this setting, where constraints are given as logical rules over learned visual concepts, but real rule violations are rare or absent during training. We propose a neural rule evaluator that compiles each constraint into a directed acyclic graph and learns feature-aware subtree MLP gates for its internal logical operators. Each gate maps child features and edge-level negations to a parent representation and a rule-satisfaction probability, with intermediate supervision obtained from exact Boolean propagation over ground-truth concept labels. The key difficulty is that same-image training data often provide insufficient coverage of informative truth configurations and also allow shortcut solutions. To address this, we introduce chimera training: an operand-level counterfactual construction at the feature level. Instead of mixing input images, we concatenate subtree features from different samples; each operand keeps the hard truth label of the sample it came from, and the chimera target is obtained by applying the node's logical operator to those inherited labels. This supplies supervised logical counterexamples without requiring real anomalous images. Across CLEVRER, OpenImages, and VidOR, the resulting evaluator improves rule-level anomaly AUROC over independent-events and same-image semantic-training baselines, especially for compositional and relational rules. The method yields both scalar anomaly scores and rule-level attributions.

URL PDF HTML ☆

赞 0 踩 0

2605.26147 2026-05-27 cs.LG

Neural Bayesian Sequential Routing

神经贝叶斯序列路由

Yongchao Huang

AI总结提出神经贝叶斯序列路由（NBSR）框架，将神经推理建模为有向无环图上的主动证据累积，通过狄利克雷-分类共轭框架实现不确定性量化、早期退出和资源理性推理。

Comments 71 pages

详情

AI中文摘要

人类决策是序列化的且具有不确定性意识，然而标准神经网络通常依赖静态、密集的前向计算，对证据获取、不确定性演化或何时停止计算的可视性有限。我们引入了 extbf{神经贝叶斯序列路由（NBSR）}，这是一个将神经推理建模为层次化有向无环图（DAG）上的主动证据累积的框架。在狄利克雷-分类共轭框架内，神经专家查询一个持久的全局知识预言机以提取正证据向量，这些向量作为伪计数，通过精确共轭加法更新狄利克雷信念状态。结合Gumbel-Softmax直通估计器，该更新实现了硬性、路径依赖的路由，同时保留用于端到端训练的代理梯度。由此产生的狄利克雷精度和熵为不确定性量化、基于熵的早期退出、分布外（OOD）弃权以及成本感知的证据获取提供了机制。我们证明，在严格正证据提取下，总狄利克雷精度沿任何有效轨迹单调增加，边际预测方差有界，形式化了序列“假设锐化”；在理想容量和优化假设下，终端狄利克雷期望恢复贝叶斯最优条件分布。在视觉分类、结构化医学诊断、语言建模、部分可观测控制以及成本感知贝叶斯实验设计上的实证评估表明，NBSR在提供透明的路由轨迹、路径依赖的证据归因、不确定性感知的决策控制以及资源理性推理的同时，实现了具有竞争力的预测性能。总体而言，NBSR为可解释、模块化和资源理性的智能体AI提供了一个数学上坚实的框架。

英文摘要

Human decision-making is sequential and uncertainty-aware, yet standard neural networks often rely on static, dense forward computation with limited visibility into evidence acquisition, uncertainty evolution, or when computation should stop. We introduce \textbf{Neural Bayesian Sequential Routing (NBSR)}, a framework that models neural inference as active evidence accumulation over a hierarchical Directed Acyclic Graph (DAG). Within a Dirichlet--Categorical conjugate framework, neural experts query a persistent global knowledge oracle to extract positive evidence vectors, which act as pseudo-counts and update a Dirichlet belief state by exact conjugate addition. Coupled with a Gumbel-Softmax Straight-Through estimator, this update enables hard, path-dependent routing while preserving surrogate gradients for end-to-end training. The resulting Dirichlet precision and entropy provide mechanisms for uncertainty quantification, entropy-based early exiting, OOD abstention, and cost-aware evidence acquisition. We prove that, under strictly positive evidence extraction, total Dirichlet precision increases monotonically along any valid trajectory and marginal predictive variance is bounded, formalizing sequential ``hypothesis sharpening''; under idealized capacity and optimization assumptions, the terminal Dirichlet expectation recovers the Bayes-optimal conditional distribution. Empirical evaluations across visual categorization, structured medical diagnosis, language modeling, partially observable control, and cost-aware Bayesian experimental design show that NBSR achieves competitive predictive performance while providing transparent routing traces, path-dependent evidence attribution, uncertainty-aware decision control, and resource-rational inference. Overall, NBSR offers a mathematically grounded framework for interpretable, modular, and resource-rational agentic AI.

URL PDF HTML ☆

赞 0 踩 0

2605.24785 2026-05-27 cs.AI

PANDO: Efficient Multimodal AI Agents via Online Skill Distillation

PANDO: 通过在线技能蒸馏实现高效多模态AI智能体

Yubo Li, Yidi Miao, Yuntian Shen, Yuxin Liu

AI总结提出PANDO框架，通过在线技能蒸馏、结构化技能库和缓存感知提示，在VisualWebArena任务中以更低token消耗实现更高成功率。

详情

AI中文摘要

近期多模态网络智能体的进展通常依赖于增加推理时的计算量，包括展开搜索、验证器传递、离线技能发现和专家模型堆叠。这引发了一个核心问题：网络智能体能否随着经验积累变得更高效，而不是更昂贵？我们首先分析VisualWebArena的轨迹，识别出三个反复出现的低效来源：重复动作循环、隐藏发现成本和低提示缓存复用。然后，我们引入PANDO，一个单次展开的在线技能蒸馏框架，它维护一个结构化的技能库，并结合进度反思、基于置信度的技能降级、层次化路由、视觉压缩和缓存感知提示。在全部910个VisualWebArena任务上，PANDO实现了58.3%的成功率，优于SGV（54.0%）和我们的WALT复现（45.2%），同时比SGV少使用58%的token，比WALT少使用61%的token，且无需任何预评估发现预算。一个300任务的消融实验进一步表明，规则和例程提供了大部分成功增益，而路由、压缩和缓存感知提示将更大的技能库转化为更低的边际token成本。最后，我们引入三个轨迹级效率指标——动作重复率、步骤开销比和提示缓存利用率——以使效率在终端成功之外可见。

英文摘要

Recent advances in multimodal web agents often rely on increased inference-time computation, including rollout search, verifier passes, offline skill discovery, and specialist model stacks. This raises a central question: can a web agent become more efficient as it accumulates experience, rather than more expensive? We first analyze trajectories from VisualWebArena and identify three recurring sources of inefficiency: repeat-action loops, hidden discovery costs, and low prompt-cache reuse. We then introduce PANDO, a single-rollout online skill-distillation framework that maintains a structured Skill Library and combines progress reflection, confidence-based skill demotion, hierarchical routing, visual compression, and cache-aware prompting. On the full set of 910 VisualWebArena tasks, PANDO achieves a 58.3% success rate, outperforming SGV (54.0%) and our WALT reproduction (45.2%), while using 58% fewer tokens than SGV and 61% fewer tokens than WALT, without any pre-evaluation discovery budget. A 300-task ablation further shows that rules and routines provide most of the success gains, while routing, compression, and cache-aware prompting convert the larger skill library into lower marginal token cost. Finally, we introduce three trajectory-level efficiency metrics -- Action Repetition Rate, Step Overhead Ratio, and Prompt Cache Utilization -- to make efficiency visible beyond terminal success.

URL PDF HTML ☆

赞 0 踩 0

2605.06213 2026-05-27 cs.AI

Beyond Fixed Benchmarks and Worst-Case Attacks: Dynamic Boundary Evaluation for Language Models

超越固定基准和最坏情况攻击：语言模型的动态边界评估

Haoxiang Wang, Da Yu, Huishuai Zhang

AI总结提出动态边界评估（DBE）方法，通过定位模型在随机采样解码下通过概率接近0.5的边界项，构建统一难度尺度的评估协议，以解决固定基准的饱和问题。

Comments This submission is being withdrawn because it was submitted without the knowledge and authorization of all co-authors. The authors need to resolve this authorship/authorization issue before any public posting

详情

AI中文摘要

当前评估大型语言模型（LLM）依赖于固定基准，这些基准对所有模型应用相同的测试项，产生天花板和地板效应，掩盖了能力差距。我们认为最具信息量的评估信号位于边界，即在随机采样解码下每个提示的通过概率接近0.5，并提出了动态边界评估（DBE），它主动定位每个模型的边界，并将其置于全局可比的难度尺度上。DBE提供三个产物：(i) 一个校准的题库，涵盖安全性、能力和真实性，其每项难度标签在9个参考LLM上得到验证；(ii) 技能引导的边界搜索（SGBS），一种仅通过API级查询访问即可为目标LLM找到边界项的搜索算法；(iii) 一个评估协议，将新的LLM置于统一的能力尺度上，并在目标超出题库覆盖范围时自适应地扩展评估集。我们在四个类别上实例化DBE，涵盖安全性（有害请求拒绝和过度拒绝）、能力（受限指令遵循）和真实性（多轮谄媚抵抗）。由此产生的评估覆盖更广泛的模型谱系而不饱和，同时与现有数据集兼容。

英文摘要

Evaluating large language models (LLMs) today rests on fixed benchmarks that apply the same set of items to any model, producing ceiling and floor effects that mask capability gaps. We argue that the most informative evaluation signal lies at the boundary, where the per-prompt pass probability is near $0.5$ under random-sampling decoding, and propose Dynamic Boundary Evaluation (DBE), which actively locates each model's boundary and places it on a globally comparable difficulty scale. DBE delivers three artifacts: (i) a calibrated item bank covering safety, capability, and truthfulness, with per-item difficulty labels validated across $9$ reference LLMs; (ii) Skill-Guided Boundary Search (SGBS), a search algorithm that finds boundary items for a given target LLM using only API-level query access; and (iii) an evaluation protocol that places a new LLM on a unified ability scale and grows the evaluation set adaptively when the target falls outside the bank's coverage. We instantiate DBE on four categories spanning safety (harmful request refusal and over-refusal), capability (constrained instruction following), and truthfulness (multi-turn sycophancy resistance). The resulting evaluation covers a broader model spectrum without saturation while remaining compatible with existing datasets.

URL PDF HTML ☆

赞 0 踩 0

2603.17685 2026-05-27 cs.LG

Flow Matching Policy Optimization with Mirror Descent and Entropy Constraints

基于镜像下降和熵约束的流匹配策略优化

Ting Gao, Stavros Orfanoudakis, Nan Lin, Winnie Daamen, Serge Hoogendoorn, Elvin Isufi

AI总结针对在线强化学习中策略表达性与探索-利用平衡的挑战，提出基于ODE流匹配的框架FMER，通过免模拟策略优化和可计算熵目标，结合动态温度调节，在稀疏奖励任务中取得优越性能。

详情

AI中文摘要

平衡策略表达性与探索-利用权衡是在线强化学习（RL）中的核心挑战。虽然基于随机微分方程（SDE）的扩散策略可以表示复杂的多模态动作分布，但它们存在两个关键限制：其随机逆过程使熵难以处理（需要启发式探索），并且通过长去噪链计算策略梯度既昂贵又不稳定。在这项工作中，我们表明基于ODE的流匹配通过实现免模拟策略优化和可处理的熵计算，从本质上解决了这些问题。基于此，我们引入了基于镜像下降和熵约束的流匹配策略优化（FMER）。我们的框架以三种方式利用这一见解。首先，我们从理论上证明，最小化优势加权条件流匹配损失可以作为策略镜像下降的免模拟替代。这引导速度场朝向高价值区域，同时完全避免通过ODE求解器进行反向传播。其次，我们推导了一个解析熵目标，该目标校正了由$ anh$变换（将无界潜在空间映射到有界动作）引起的密度失真，从而促进了有原则的最大熵优化。最后，我们基于有效样本量动态调整镜像下降温度，以在训练期间强制执行稳健的信任区域。实验评估表明，FMER在具有挑战性的稀疏奖励FrankaKitchen环境中实现了优越的性能，同时在标准密集奖励MuJoCo基准测试中保持了有竞争力的结果。

英文摘要

Balancing policy expressiveness with the exploration-exploitation trade-off is a core challenge in online Reinforcement Learning (RL). While Stochastic Differential Equation (SDE)-based diffusion policies can represent complex, multimodal action distributions, they suffer from two critical limitations: their stochastic reverse processes render entropy intractable (necessitating heuristic exploration), and computing policy gradients through long denoising chains is expensive and unstable. In this work, we show that ODE-based flow matching inherently resolves these issues by enabling both simulation-free policy optimization and tractable entropy computation. Building on this, we introduce Flow Matching Policy Optimization with Mirror Descent and Entropy Constraints (FMER). Our framework exploits this insight in three ways. First, we theoretically establish that minimizing an advantage-weighted conditional flow matching loss acts as a simulation-free surrogate for policy mirror descent. This steers the velocity field toward high-value regions while entirely avoiding backpropagation through the ODE solver. Second, we derive an analytic entropy objective that corrects for the density distortion caused by the $\tanh$ transformation (mapping an unbounded latent space to bounded actions), thereby facilitating principled maximum-entropy optimization. Finally, we dynamically tune the mirror descent temperature based on the effective sample size to enforce a robust trust region during training. Empirical evaluations demonstrate that FMER achieves superior performance on the challenging sparse-reward FrankaKitchen environment, while maintaining competitive results across standard dense-reward MuJoCo benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2602.21450 2026-05-27 cs.RO cs.SY eess.SY

Vector Fields for Path Following on Lie Groups with Application in Robot Control

李群上的路径跟随向量场及其在机器人控制中的应用

Felipe Bartelt, Luciano C. A. Pimenta, Weijia Yao, Vinicius M. Gonçalves

AI总结针对李群上的路径跟随问题，提出一种通用向量场框架，保证从几乎所有初始条件收敛到期望参数曲线并连续运动，在SE(3)上给出最小表示的控制输入，通过机械臂实验验证有效性。

Comments Manuscript revised: new title, reframed abstract and introduction for robotics, and added a coauthor

详情

AI中文摘要

许多机器人系统允许独立控制位置和姿态（位姿），包括全向飞行器、水下机器人和机械臂末端执行器。在许多应用中，这些系统必须遵循连续的位姿序列，从而形成轨迹跟踪或路径跟随问题。与轨迹跟踪相比，路径跟随具有重要的实际优势。我们特别关注李群上的路径跟随问题。将机器人视为在三维空间中运动的刚体，该路径跟随问题可以表述为在矩阵李群SE(3)上设计引导向量场的问题。在本文中，我们开发了一个通用的向量场框架，用于连通矩阵李群上的路径跟随，其中SE(3)是一个重要的特例。所提出的向量场保证从几乎所有初始条件收敛到期望参数曲线，同时确保沿路径连续运动。此外，另一个有趣的特点是，与先前的工作相比，控制输入在表示上是“最小的”，并且更接近工程应用（例如，在SE(3)情况下的身体扭曲）。在建立一般情况后，该框架被专门应用于机器人学中特别感兴趣的SE(3)，产生了一种适用于实时机器人控制的高效算法。使用机械臂跟踪复杂位姿路径的实验证明了该方法的有效性。还提供了开源实现。

英文摘要

Many robotic systems allow independent control of position and orientation (pose), including omnidirectional aerial vehicles, underwater robots, and manipulator end-effectors. In many applications, these systems must follow a continuous sequence of poses, leading to either trajectory-tracking or path following formulations. Compared to trajectory-tracking, path following offers important practical advantages. In particular, we focus on the problem of path following on Lie groups. Considering the robots as rigid bodies moving in the 3D space, this path-following problem can be posed as a problem of designing guiding vector fields on the matrix Lie group SE(3). In this paper, we develop a general vector-field framework for path following on connected matrix Lie groups, of which SE(3) is a prominent special case. The proposed vector field guarantees convergence to a desired parametric curve from almost all initial conditions while ensuring continuous motion along the path. Furthermore, another interesting feature is that, as opposed to previous works, the control input is "minimal" in terms of representation and closer to the engineering application (e.g., the body twist in the case SE(3)). After establishing the general case, the framework is then specialized to SE(3), of special interest in robotics, yielding an efficient algorithm suitable for real-time robotic control. Experiments with a robotic manipulator tracking complex pose paths demonstrate the effectiveness of the approach. An open-source implementation is also provided.

URL PDF HTML ☆

赞 0 踩 0

2601.21008 2026-05-27 cs.LG cs.AI math.OC

ORLoopBench: Solver-in-the-Loop Benchmarks for Self-Correction and Behavioral Rationality in Operations Research

ORLoopBench：运筹学中自我修正与行为理性的求解器在环基准测试

Ruicheng Ao, David Simchi-Levi, Xinshang Wang

AI总结提出ORLoopBench基准套件，通过将不可行模型修复形式化为求解器在环马尔可夫决策过程，利用不可约不可行子系统（IIS）反馈，结合验证强化学习训练（RLVR），使8B模型在LP修复上超越前沿API（95.3% vs 92.4% RR@5），并揭示全模型代码再生中的语义漂移问题。

Comments 58 pages, accepted by ICML 2026

详情

AI中文摘要

运筹学从业者通过迭代过程调试不可行模型：检查不可约不可行子系统（IIS），识别约束冲突，并修复公式直至恢复可行性。现有的LLM基准大多将OR视为从问题描述到求解器代码的一次性翻译，忽略了这一诊断循环。我们将不可行模型修复形式化为一个求解器在环马尔可夫决策过程，其中每个动作触发求解器重新执行和IIS重新计算，产生确定性的、可验证的反馈。我们引入ORLoopBench，一个包含两个组件的基准套件：OR-Debug-Bench发布5,362个LP/MILP修复实例，而OR-Bias-Bench评估库存设置中的闭式运营决策理性。求解器验证的RLVR训练使8B模型在LP修复上超越前沿API（95.3% vs 92.4% RR@5），改善诊断行为，并迁移到MILP修复。同样的评估暴露了全模型代码再生中的语义漂移：可行的再生MILP可能解决错误的问题。使用求解器预言机的过程级评估能够为可靠的OR自我修正进行针对性训练。

英文摘要

Operations Research practitioners debug infeasible models through an iterative process: inspecting Irreducible Infeasible Subsystems ( IIS), identifying constraint conflicts, and repairing formulations until feasibility is restored. Existing LLM benchmarks mostly treat OR as one-shot translation from problem descriptions to solver code, omitting this diagnostic loop. We formalize infeasible-model repair as a solver-in-the-loop Markov Decision Process in which each action triggers solver re-execution and IIS recomputation, yielding deterministic, verifiable feedback. We introduce ORLoopBench, a benchmark suite with two components: OR-Debug-Bench releases 5,362 LP/MILP repair instances, while OR-Bias-Bench evaluates closed-form operational decision rationality across inventory settings. Solver-verified RLVR training enables an 8B model to surpass frontier APIs on LP repair (95.3% vs 92.4% RR @5), improves diagnostic behavior, and transfers to MILP repair. The same evaluation exposes semantic drift in whole-model code regeneration: feasible regenerated MILPs can solve the wrong problem. Process-level evaluation with solver oracles enables targeted training for reliable OR self-correction.

URL PDF HTML ☆

赞 0 踩 0

2601.22648 2026-05-27 cs.AI cs.LG

UCPO: Uncertainty-Aware Policy Optimization

UCPO：不确定性感知策略优化

Xianzhou Zeng, Jing Huang, Chunmei Xie, Gongrui Nan, Siye Chen, Mengyu Lu, Weiqi Xiong, Qixuan Zhou, Junhao Zhang, Qiang Zhu, Yadong Li, Xingzhong Xu

AI总结针对现有强化学习范式在不确定性奖励下存在的优势偏差和过度自信问题，提出三元优势解耦和动态不确定性奖励调整机制，显著提升模型在知识边界外的可靠性。

Comments Accepted by ICML 2026

详情

AI中文摘要

构建可信赖的大语言模型的关键在于赋予其内在的不确定性表达能力，从而减轻高风险应用中的过度自信错误。然而，现有的强化学习范式（如GRPO）由于二元决策空间和静态不确定性奖励，常常遭受优势偏差，导致过度保守或过度自信。为了解决这一挑战，本文揭示了当前结合不确定性奖励的强化学习范式中奖励破解和过度自信的根本原因，并在此基础上提出了不确定性感知策略优化（UCPO）框架。UCPO采用三元优势解耦来分离并独立归一化确定性和不确定性轨迹，从而消除优势偏差。此外，动态不确定性奖励调整机制根据模型演化和实例难度实时调整不确定性权重。在数学推理和通用任务上的实验结果表明，UCPO有效解决了奖励不平衡问题，显著提高了模型在知识边界外的可靠性。

英文摘要

The key to building trustworthy large language models (LLMs) lies in endowing them with inherent uncertainty expression capabilities, thereby mitigating overconfident errors in high-stakes applications. However, existing RL paradigms such as GRPO often suffer from Advantage Bias due to binary decision spaces and static uncertainty rewards, inducing either excessive conservatism or overconfidence. To tackle this challenge, this paper unveils the root causes of reward hacking and overconfidence in current RL paradigms incorporating uncertainty-based rewards, based on which we propose the UnCertainty-Aware Policy Optimization (UCPO) framework. UCPO employs Ternary Advantage Decoupling to separate and independently normalize deterministic and uncertain rollouts, thereby eliminating advantage bias. Furthermore, a Dynamic Uncertainty Reward Adjustment mechanism adapts uncertainty weights in real-time according to model evolution and instance difficulty. Experimental results in mathematical reasoning and general tasks demonstrate that UCPO effectively resolves the reward imbalance, significantly improving the reliability of the model beyond their knowledge boundaries.

URL PDF HTML ☆

赞 0 踩 0

2511.20586 2026-05-27 cs.AI cs.LG

PaTAS: A Framework for Trust Propagation in Neural Networks Using Subjective Logic

PaTAS：基于主观逻辑的神经网络信任传播框架

Koffi Ismael Ouattara, Ioannis Krontiris, Theo Dimitrakos, Dennis Eisermann, Houda Labiod, Frank Kargl

AI总结提出PaTAS框架，利用主观逻辑在神经网络中并行传播信任，通过信任节点和信任函数量化输入、参数和激活的信任，并设计参数信任更新和推理路径信任评估方法，以在对抗或退化条件下提供可解释的信任估计。

详情

AI中文摘要

可信度已成为安全关键应用中人工智能系统部署的关键要求。传统的评估指标（如准确率和精确率）无法充分捕捉不确定性或模型预测的可靠性，尤其是在对抗或退化条件下。本文介绍了并行信任评估系统（PaTAS），这是一个使用主观逻辑（SL）对神经网络中的信任进行建模和传播的框架。PaTAS通过信任节点和信任函数与标准神经计算并行运行，这些节点和函数在网络中传播输入、参数和激活信任。该框架定义了一种参数信任更新机制，以在训练过程中优化参数可靠性，以及一种推理路径信任评估（IPTA）方法，以在推理时计算实例特定的信任。在真实世界和对抗性数据集上的实验表明，PaTAS产生可解释、对称且收敛的信任估计，这些估计补充了准确率，并揭示了在中毒、有偏或不确定数据场景中的可靠性差距。结果表明，PaTAS有效区分良性输入和对抗性输入，并识别模型置信度与实际可靠性不一致的情况。通过在神经架构中实现透明且可量化的信任推理，PaTAS为评估AI生命周期中的模型可靠性提供了基础。

英文摘要

Trustworthiness has become a key requirement for the deployment of artificial intelligence systems in safety-critical applications. Conventional evaluation metrics, such as accuracy and precision, fail to appropriately capture uncertainty or the reliability of model predictions, particularly under adversarial or degraded conditions. This paper introduces the Parallel Trust Assessment System (PaTAS), a framework for modeling and propagating trust in neural networks using Subjective Logic (SL). PaTAS operates in parallel with standard neural computation through Trust Nodes and Trust Functions that propagate input, parameter, and activation trust across the network. The framework defines a Parameter Trust Update mechanism to refine parameter reliability during training and an Inference-Path Trust Assessment (IPTA) method to compute instance-specific trust at inference. Experiments on real-world and adversarial datasets demonstrate that PaTAS produces interpretable, symmetric, and convergent trust estimates that complement accuracy and expose reliability gaps in poisoned, biased, or uncertain data scenarios. The results show that PaTAS effectively distinguishes between benign and adversarial inputs and identifies cases where model confidence diverges from actual reliability. By enabling transparent and quantifiable trust reasoning within neural architectures, PaTAS provides a foundation for evaluating model reliability across the AI lifecycle.

URL PDF HTML ☆

赞 0 踩 0

2511.14683 2026-05-27 cs.CL

Quadratic Term Correction on Heaps' Law

Heap定律的二次项修正

Oscar Fontanelli, Wentian Li

AI总结针对Heap定律在双对数坐标下仍呈轻微凹形的问题，提出二次函数拟合方法，并通过二十部英文小说验证，发现线性系数略大于1、二次系数约为-0.02，且曲率与“伪方差”相关。

Comments 3 figures

详情

AI中文摘要

Heap或Herdan定律通过幂律函数表征词类型与词例之间的关系，该函数在线性-线性尺度上是凹的，但在双对数尺度上是直线。然而，即使在双对数尺度上，类型-词例曲线仍轻微凹形，使幂律关系失效。作为下一阶近似，我们通过二十部英文小说（部分从其他语言翻译成英文）证明，双对数尺度下的二次函数能完美拟合类型-词例数据。对log(类型)-log(词例)数据同时包含线性和二次项的回归分析一致地得出线性系数略大于1，二次系数约为-0.02。利用“从袋中有放回地随机抽取彩色球”模型，我们证明双对数尺度的曲率等于一个负的“伪方差”。尽管当词例数量较大时，由于伪权重值较大，伪方差计算可能遇到数值不稳定性，但该形式为词例数量较小时提供了曲率的粗略估计。

英文摘要

Heaps' or Herdan's law characterizes the word-type vs. word-token relation by a power-law function, which is concave in linear-linear scale but a straight line in log-log scale. However, it has been observed that even in log-log scale, the type-token curve is still slightly concave, invalidating the power-law relation. At the next-order approximation, we have shown, by twenty English novels or writings (some are translated from another language to English), that quadratic functions in log-log scale fit the type-token data perfectly. Regression analyses of log(type)-log(token) data with both a linear and quadratic term consistently lead to a linear coefficient of slightly larger than 1, and a quadratic coefficient around -0.02. Using the ``random drawing colored ball from the bag with replacement" model, we have shown that the curvature of the log-log scale is identical to a ``pseudo-variance" which is negative. Although a pseudo-variance calculation may encounter numeric instability when the number of tokens is large, due to the large values of pseudo-weights, this formalism provides a rough estimation of the curvature when the number of tokens is small.

URL PDF HTML ☆

赞 0 踩 0

2510.06381 2026-05-27 cs.LG cs.AI

Monte Carlo Permutation Search

蒙特卡洛排列搜索

Tristan Cazenave

AI总结提出一种改进GRAVE算法的通用蒙特卡洛树搜索算法MCPS，通过利用路径上所有节点的统计信息，在多种游戏中优于GRAVE，并给出了统计权重公式的数学推导。

2509.09977 2026-05-27 cs.CV

ISTASTrack: Bridging ANN and SNN via ISTA Adapter for RGB-Event Tracking

ISTASTrack：通过ISTA适配器桥接ANN和SNN用于RGB-事件跟踪

Siying Liu, Zikai Wang, Hanle Zheng, Yifan Hu, Xilin Wang, Qingkai Yang, Jibin Wu, Hao Guo, Lei Deng

AI总结提出首个基于Transformer的ANN-SNN混合跟踪器ISTASTrack，利用ISTA适配器双向融合RGB和事件特征，实现高效鲁棒跟踪。

Comments Accepted by IEEE Transactions on Image Processing, DOI: 10.1109/TIP.2026.3694138, 15 pages, 8 figures

详情

AI中文摘要

RGB-事件跟踪已成为视觉目标跟踪中一个有前景的趋势，旨在利用RGB图像和动态尖峰事件的互补优势来提高性能。然而，现有的人工神经网络（ANN）难以充分利用事件流的稀疏和异步特性。最近，结合ANN和脉冲神经网络（SNN）的混合架构研究作为RGB-事件感知中的一种有前途的解决方案出现，但有效融合跨异构范式的特征仍然是一个挑战。在这项工作中，我们提出了ISTASTrack，这是第一个基于Transformer的ANN-SNN混合跟踪器，配备了ISTA适配器用于RGB-事件跟踪。该双分支模型采用视觉Transformer从RGB输入中提取空间上下文，并使用脉冲Transformer从事件流中捕获时空动态。为了弥合ANN和SNN特征之间的模态和范式差距，我们系统地设计了一个基于模型的ISTA适配器，用于两个分支之间的双向特征交互，该适配器通过展开迭代收缩阈值算法从稀疏表示理论推导而来。此外，我们在适配器中引入了一个时间下采样注意力模块，以在潜在空间中对齐多步SNN特征与单步ANN特征，从而改善时间融合。在RGB-事件跟踪基准（如FE240hz、VisEvent、COESOT和FELT）上的实验结果表明，ISTASTrack在保持高能效的同时实现了最先进的性能，突显了混合ANN-SNN设计在鲁棒视觉跟踪中的有效性和实用性。代码已公开在https://github.com/lsying009/ISTASTrack.git。

英文摘要

RGB-Event tracking has become a promising trend in visual object tracking to leverage the complementary strengths of both RGB images and dynamic spike events for improved performance. However, existing artificial neural networks (ANNs) struggle to fully exploit the sparse and asynchronous nature of event streams. Recent efforts toward hybrid architectures combining ANNs and spiking neural networks (SNNs) have emerged as a promising solution in RGB-Event perception, yet effectively fusing features across heterogeneous paradigms remains a challenge. In this work, we propose ISTASTrack, the first transformer-based \textbf{A}NN-\textbf{S}NN hybrid \textbf{Track}er equipped with \textbf{ISTA} adapters for RGB-Event tracking. The two-branch model employs a vision transformer to extract spatial context from RGB inputs and a spiking transformer to capture spatio-temporal dynamics from event streams. To bridge the modality and paradigm gap between ANN and SNN features, we systematically design a model-based ISTA adapter for bidirectional feature interaction between the two branches, derived from sparse representation theory by unfolding the iterative shrinkage thresholding algorithm. Additionally, we incorporate a temporal downsampling attention module within the adapter to align multi-step SNN features with single-step ANN features in the latent space, improving temporal fusion. Experimental results on RGB-Event tracking benchmarks, such as FE240hz, VisEvent, COESOT, and FELT, have demonstrated that ISTASTrack achieves state-of-the-art performance while maintaining high energy efficiency, highlighting the effectiveness and practicality of hybrid ANN-SNN designs for robust visual tracking. The code is publicly available at https://github.com/lsying009/ISTASTrack.git.

URL PDF HTML ☆

赞 0 踩 0

2501.00520 2026-05-27 cs.CV cs.LG

Innovative Silicosis and Pneumonia Classification: Leveraging Graph Transformer Post-hoc Modeling and Ensemble Techniques

创新性矽肺和肺炎分类：利用图Transformer后验建模与集成技术

Bao Q. Bui, Tien T. T. Nguyen, Duy M. Le, Cong Tran, Cuong Pham

AI总结提出结合图Transformer网络与传统深度神经网络的架构，并采用平衡交叉熵损失函数和集成方法，在自建胸部X光数据集上实现高精度矽肺与肺炎分类。

Comments Withdrawn by the authors because the manuscript contains incomplete and potentially misleading descriptions of the dataset construction and evaluation protocol, particularly in the Dataset and Experimental Setup sections. The work should not be cited or used as an independent reference in its current form

详情

AI中文摘要

本文对矽肺相关肺部炎症的分类与检测进行了全面研究。我们的主要贡献包括：1) 创建了一个名为SVBCX的新策划胸部X光（CXR）图像数据集，该数据集针对不同病原体引起的肺部炎症的细微差别进行了定制，为矽肺和肺炎研究社区提供了宝贵资源；2) 提出了一种新颖的深度学习架构，该架构将图Transformer网络与传统深度神经网络模块相结合，用于有效分类矽肺和肺炎。此外，我们采用平衡交叉熵（BalCE）作为损失函数，以确保不同类别之间的更均匀学习，增强模型辨别肺部状况细微差异的能力。所提出的模型架构和损失函数选择旨在提高炎症检测的准确性和可靠性，特别是在矽肺背景下。此外，我们的研究探索了一种集成方法的有效性，该方法结合了不同模型架构的优势。在构建的数据集上的实验结果表明，与基线模型相比，取得了显著改进。模型集成实现了宏F1分数0.9749，每个类别的AUC ROC分数超过0.99，突显了我们的方法在准确和鲁棒的肺部炎症分类中的有效性。

英文摘要

This paper presents a comprehensive study on the classification and detection of Silicosis-related lung inflammation. Our main contributions include 1) the creation of a newly curated chest X-ray (CXR) image dataset named SVBCX that is tailored to the nuances of lung inflammation caused by distinct agents, providing a valuable resource for silicosis and pneumonia research community; and 2) we propose a novel deep-learning architecture that integrates graph transformer networks alongside a traditional deep neural network module for the effective classification of silicosis and pneumonia. Additionally, we employ the Balanced Cross-Entropy (BalCE) as a loss function to ensure more uniform learning across different classes, enhancing the model's ability to discern subtle differences in lung conditions. The proposed model architecture and loss function selection aim to improve the accuracy and reliability of inflammation detection, particularly in the context of Silicosis. Furthermore, our research explores the efficacy of an ensemble approach that combines the strengths of diverse model architectures. Experimental results on the constructed dataset demonstrate promising outcomes, showcasing substantial enhancements compared to baseline models. The ensemble of models achieves a macro-F1 score of 0.9749 and AUC ROC scores exceeding 0.99 for each class, underscoring the effectiveness of our approach in accurate and robust lung inflammation classification.

URL PDF HTML ☆

赞 0 踩 0

2605.27299 2026-05-27 cs.CR cs.AI cs.HC cs.LG cs.SY eess.SY

Risk Averse Alert Prioritization for IDS Using Subnormal Gaussian Fuzzy Models

使用次正态高斯模糊模型的IDS风险规避警报优先级排序

Murat Moran

AI总结提出基于次正态高斯模糊数的警报优先级排序框架，通过建模威胁严重性、检测置信度和组织风险态度三种不确定性，利用排序指数实现可调安全姿态，实验证明在检测器退化下比基线方法更鲁棒。

详情

AI中文摘要

现代入侵检测系统每天生成数千条警报，但由于误报或低影响事件过多，警报疲劳严重限制了安全运营的有效性。我们通过提出一个基于次正态高斯模糊数的原则性警报优先级排序框架来解决这个问题，该框架明确建模了三种不确定性来源：威胁严重性、检测置信度和组织风险态度。每个警报被表示为一个模糊数，其核心表示严重性，展度表示不确定性，高度反映检测可靠性。我们应用排序指数对警报进行优先级排序，允许组织通过风险态度参数调整安全姿态。在CIC-IDS2017和NSL-KDD上的实验验证表明，在检测器退化下，该方法比基线方法具有更强的鲁棒性（NDCGrel@100为0.9963对比0.8215），在中等置信度警报中具有明显区分度，在稳健检测器下与基线方法接近。该框架具有理论基础、计算效率高、提供可解释推理，并且在检测器系列和校准错误场景下保持鲁棒性。

英文摘要

Modern intrusion detection systems generate thousands of alerts daily, but alert fatigue severely limits security operations effectiveness due to too many false positives or low-impact events. We address this by proposing a principled framework for alert prioritization based on subnormal Gaussian fuzzy numbers, explicitly modeling three sources of uncertainty: threat severity, detection confidence, and organizational risk attitude. Each alert is represented as a fuzzy number with the core indicating severity, spread indicating uncertainty, and height reflecting detection reliability. We apply ranking indices to prioritize alerts, allowing organizations to tune security posture through a risk-attitude parameter. Experimental validation on CIC-IDS2017 and NSL-KDD demonstrates greater robustness than baselines under detector degradation (0.9963 vs 0.8215 NDCGrel@100), with distinct differentiation in mid-confidence alerts and near-parity with baselines under robust detectors. The framework is theoretically grounded, computationally efficient, provides interpretable reasoning, and remains robust across detector families and miscalibration scenarios.

URL PDF HTML ☆

赞 0 踩 0

2605.26590 2026-05-27 cs.CY cs.AI

Examining the Challenges of Intellectual Property in AI-Generated Productions

审视人工智能生成作品中的知识产权挑战

Ali Mazhar, Mohammad Zare, Marjan Veysi

AI总结本文通过比较伊朗、欧盟、英国和美国的法律框架，分析人工智能生成作品在知识产权保护中的所有权归属与法律挑战，并提出修订法律或引入新型权利的建议。

详情

Journal ref: New Researches in the Smart City, Vol. 3, No. 4, Summer 2025

AI中文摘要

随着能够自主生成艺术、文学、音乐作品甚至发明而无需直接人工干预的人工智能系统的进步，知识产权制度面临前所未有的问题和挑战。最关键的问题涉及在缺乏人类创作者的情况下道德和经济权利的所有权，以及如何为这些产出提供法律保护。本文首先回顾了这一领域的理论基础和现有文献，然后比较研究了伊朗的法律框架，如1969年《作者、作曲家和艺术家权利保护法》和《专利和商标注册法》，以及其他法律体系，包括欧盟、英国和美国。此外，还分析了关于人工智能生成作品知识产权的现有法律观点及相关执法挑战。研究结果揭示了当前伊朗法律框架内的重大监管空白。为了在促进创新与保护人类创造力之间取得平衡，修订现有法律并引入新方法，例如为人工智能生成作品定义特定的知识产权或指定相关人类代理人之间的所有权，似乎是必要的。

英文摘要

With the advancement of artificial intelligence systems capable of autonomously generating artistic, literary, musical works, and even inventions without direct human intervention, the intellectual property (IP) regime faces unprecedented questions and challenges. The most critical issue concerns the ownership of moral and economic rights in the absence of a human creator, and how such outputs can be granted legal protection. This paper first reviews the theoretical foundations and existing literature in this domain, then comparatively examines Iranian legal frameworks such as the 1969 Law for the Protection of Authors, Composers, and Artists Rights and the Patent and Trademark Registration Law-alongside other legal systems, including the European Union, the United Kingdom, and the United States. Furthermore, existing legal perspectives on the intellectual property of AI-generated works and the related enforcement challenges are analyzed. The findings reveal significant regulatory gaps within the current Iranian legal framework. To balance the promotion of innovation with the preservation of human creativity, revising existing laws and introducing novel approaches such as defining a specific intellectual property right for AI-generated works or designating ownership among associated human agents appears to be essential.

URL PDF HTML ☆

赞 0 踩 0

2605.26168 2026-05-27 cs.OS cs.LG

LearnedCache: An eBPF-Integrated Perceptron-Based Eviction Policy for the Linux Page Cache

LearnedCache: 一种基于eBPF集成的感知器的Linux页面缓存驱逐策略

Zejia Qi

AI总结提出LearnedCache，一种基于eBPF和单层感知器的Linux页面缓存驱逐策略，通过真实内核数据训练模型，在代表性工作负载下实现高达10%的插入率提升。

Comments 11 pages, 12 figures, 4 listings. Policies and harnesses: https://github.com/JayAndJef/cache_ext_lc . Model and visualizations: https://github.com/JayAndJef/learnedcache

详情

AI中文摘要

Linux是数字时代的基础，占据了云和移动操作系统市场的大部分份额。任何运行Linux的设备都使用Linux页面缓存，这是操作系统和应用程序性能的核心支柱，旨在减少不必要的磁盘访问。许多页面缓存驱逐策略已被开发，但仍受限于启发式方法的僵化。近年来，AI驱动工具的兴起，加上Linux设备工作负载的日益多样化，为机器学习驱动的缓存驱逐策略奠定了基础。该领域已有有前景的研究，但仅限于CDN等用户空间应用。我们开发了LearnedCache，一种基于eBPF集成的单层感知器的Linux页面缓存驱逐策略，使用来自多样化工作负载的真实内核数据进行训练。我们展示了多个线性模型在建模页面重用时间上的中位AUC接近80%，然后进一步将这些模型嵌入Linux内核以进行实时性能评估。通过对每个工作负载与FIFO基线进行50次配对试验的统计测试，LearnedCache表明，在代表性经验工作负载下，机器学习驱动的缓存驱逐策略在Linux内核中是可行的，并且能够在特定工作负载下以统计显著的优势超越传统FIFO，插入率（缓存命中率的频率调整派生指标）提升高达10%，同时开销极小。

英文摘要

Linux is the foundation of the digital age, accounting for the majority of the cloud and mobile OS markets. Any device that runs Linux uses the Linux page cache, a central pillar in OS and application performance, serving to reduce extraneous disk access. Many page cache eviction policies have been developed but remain bound by the rigidity of heuristics. The rise of AI-driven tools in recent years, melded with the ever-increasing variety of workloads for Linux devices, sets the stage for machine-learning-driven cache eviction policies. Promising research has been done in this field, but only in the field of user-space applications such as CDNs. We develop LearnedCache, an eBPF-integrated single-layer perceptron-based cache eviction policy for the Linux page cache, trained on real kernel data from diverse workloads. We demonstrate median AUCs of nearly 80% over multiple linear models modeling page reuse time, then take a step further by embedding these models inside the Linux kernel for real-time performance evaluation. Through statistical testing over 50 paired trials against a baseline of FIFO for each workload, LearnedCache reveals that machine-learning-derived cache eviction policies are practical in the Linux kernel under representative empirical workloads and are able to surpass conventional FIFO by statistically significant margins of up to 10% in insertion rate, a frequency-adjusted derivation of cache hit rate, in specific workloads while incurring minimal overhead.

URL PDF HTML ☆

赞 0 踩 0

2605.14151 2026-05-27 math.OC cs.LG

Stochastic global optimization of continuous functions via random walks on Grassmannians

通过Grassmann流形上的随机游走实现连续函数的随机全局优化

Kartik Gupta, Stephen D. Miller, Pradeep Ravikumar, Ramarathnam Venkatesan

AI总结提出一种基于Grassmann流形上随机游走的全局优化方法，通过随机采样低维子空间并利用黑盒优化器求解子空间限制问题，在非凸、非光滑条件下仅依赖几何分布实现收敛保证，并具有盲点鲁棒性。

Comments 21 pages

详情

AI中文摘要

我们提出了一种基于Grassmann流形上随机游走的随机全局优化方法。为了最小化连续目标函数 $\ell:\mathbb{R}^d\rightarrow\mathbb{R}$，该方法反复随机采样 $k$ 维线性子空间（其中 $k\ll d$），使用任意黑盒优化器求解这些子空间上的低维限制问题，并更新迭代点（该迭代点单调地优于前一个迭代点）。与依赖凸性、光滑性、Lipschitz界或Polyak-Lojasiewicz型条件的经典优化分析不同，我们的收敛保证仅取决于通过 $\mathbb{R}^d$ 中给定点的 $k$ 维子空间上限制极小值的几何分布。我们确定了一个间隙参数——类似于随机游走的谱间隙——它控制迭代点接近全局最小值的速率。最后，我们论证了相同的分析产生了一种盲点鲁棒性：损失函数中足够窄且深的凹陷（$\ell$ 向下尖峰的小测度区域）对算法轨迹的影响有限，因为它们不太可能被随机子空间采样遇到。

英文摘要

We introduce a stochastic global optimization method based on random walks on Grassmannian manifolds. To minimize a continuous objective $\ell:\mathbb{R}^d\rightarrow\mathbb{R}$, the method repeatedly samples random $k$-dimensional linear subspaces (with $k\ll d$), solves the resulting low-dimensional restrictions of these problems to these subspaces using an arbitrary black-box optimizer, and updates the iterate (which monotonically improves upon the previous iterate). Unlike classical optimization analyses that rely on convexity, smoothness, Lipschitz bounds, or Polyak-Lojasiewicz-type conditions, our convergence guarantees depend only on the geometric distribution of restricted minima across the $k$-dimensional subspaces passing through a given point in $\mathbb{R}^d$. We identify a gap parameter -- an analogue of a spectral gap for random walks -- that controls the rate at which the iterates approach the global minimum value. Finally, we argue that the same analysis yields a blind-spot robustness property: sufficiently narrow, deep dips of the loss function (small-measure regions where $\ell$ spikes downward) have limited influence on the algorithm's trajectory, since they are unlikely to be encountered by random subspace sampling.

URL PDF HTML ☆

赞 0 踩 0