arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1709
专题追踪
2505.23277 2026-06-15 cs.CL cs.AI 版本更新

Sentinel: Decoding Context Utilization via Attention Probing for Efficient LLM Context Compression

Sentinel: 通过注意力探测解码上下文利用以实现高效LLM上下文压缩

Yong Zhang, Heng Li, Yanwen Huang, Ning Cheng, Yang Guo, Yun Zhu, Yanmeng Wang, Shaojun Wang, Jing Xiao

发表机构 * Ping An Technology (Shenzhen) Co., Ltd., China(平安科技(深圳)有限公司,中国) University of Science and Technology of China(中国科学技术大学) University of Electronic Science and Technology of China(电子科技大学)

AI总结 提出Sentinel,一种轻量级句子级压缩框架,通过冻结LLM的头部注意力模式解码推理时上下文利用行为,使用单次非自回归前向传递实现压缩,在LongBench上以0.5B代理模型达到5倍压缩且性能与7B模型方法相当。

Comments Preprint

详情
AI中文摘要

检索增强生成(RAG)通常面临长且嘈杂的检索上下文。现有的上下文压缩方法通常依赖于启发式相关性估计或监督压缩模型,而不是基于LLM在推理过程中如何利用检索到的上下文。我们提出Sentinel,一种轻量级的句子级压缩框架,从冻结LLM的头部注意力模式中解码推理时的上下文利用行为。为了在检索依赖的问答行为中提供监督,Sentinel使用QA示例训练一个轻量级探针,其中模型仅在检索上下文可用时成功。Sentinel仅使用单次非自回归前向传递进行压缩,无需专门的压缩训练或自回归评分。实验发现,即使在紧凑的代理模型中,有效的上下文利用信号仍然可访问。在LongBench上,使用0.5B代理模型的Sentinel实现了高达5倍的压缩,同时达到与基于7B规模模型的压缩方法相竞争的问答性能。尽管仅使用英文QA数据训练,Sentinel也能有效泛化到中文和域外设置。

英文摘要

Retrieval-augmented generation (RAG) often suffers from long and noisy retrieved contexts. Existing context compression methods typically rely on heuristic relevance estimation or supervised compression models rather than on how LLMs utilize retrieved context during inference. We propose Sentinel, a lightweight sentence-level compression framework that decodes inference-time contextual utilization behaviors from head-wise attention patterns of frozen LLMs. To ground supervision in retrieval-dependent answering behavior, Sentinel trains a lightweight probe using QA examples where the model succeeds only when retrieved context is available. Sentinel performs compression using only a single non-autoregressive forward pass without dedicated compression training or autoregressive scoring. Empirically, we find that effective contextual utilization signals remain accessible even in compact proxy models. On LongBench, Sentinel with a 0.5B proxy model achieves up to 5$\times$ compression while attaining question-answering performance competitive with compression methods built on 7B-scale models. Despite being trained only on English QA data, Sentinel also generalizes effectively to Chinese and out-of-domain settings.

2601.14033 2026-06-15 cs.LG cs.CR 版本更新

Private Prediction via PAC Privacy

通过PAC隐私实现私有预测

Xiaochen Zhu, Mayuri Sridhar, Srinivas Devadas

发表机构 * Massachusetts Institute of Technology(麻省理工学院)

AI总结 针对API场景下的私有预测,提出基于PAC隐私的实例级噪声校准方法,实现自适应查询下互信息线性累积,在CIFAR-10上以极低预算达到高精度,并支持通过蒸馏发布可公开查询的模型。

详情
AI中文摘要

机器学习模型越来越多地通过API提供服务。这使得私有预测(即私有化模型输出而非其参数)成为一个自然的隐私目标:模型输出维度更低,且对训练数据变化的稳定性远高于权重。虽然差分隐私(DP)无法有效利用这一点,因为它将噪声校准到最坏情况下的敏感度,而对于非凸模型,这种敏感度难以界定,但我们认为PAC隐私是私有预测的自然选择。它是基于实例的,并将噪声校准到黑盒函数的经验稳定性,以控制互信息(MI)泄露。缺失的部分是高效的自适应组合。提供预测意味着回答来自不可信用户的一系列自适应选择的查询;现有的组合要么在自适应下失效,要么呈二次增长,要么退化为与输入无关的类似DP的噪声。我们通过自适应噪声校准填补了这一空白,提出了新的对抗组合结果,并证明了在自适应和对抗性查询下,MI仅线性累积。跨模态的实验表明,预测稳定性使得即使在极小的每查询预算下也能实现高实用性:在CIFAR-10上,我们以每查询MI预算$2^{-32}$实现了87.79%的准确率。这使得在提供100万次查询的同时,能够将成员推理成功率证明性地限制在51.08%——与$(0.04, 10^{-5})$-DP相同的保证。此外,在辅助公开数据存在的情况下,大量的PAC私有预测使我们能够蒸馏出一个可发布的模型,该模型可以无限制地查询。具体来说,在ImageNet子集上的21万个私有标签蒸馏出一个学生模型,在CIFAR-10上达到91.86%的准确率,成员推理成功率限制在50.49%,与$(0.02, 10^{-5})$-DP相当。

英文摘要

Machine learning models are increasingly served behind APIs. This renders private prediction, i.e., privatizing a model's outputs rather than its parameters, a natural privacy target: model outputs are lower-dimensional and far more stable to training-data changes than weights. While differential privacy (DP) cannot effectively exploit this as it calibrates noise to worst-case sensitivity that is intractable to bound for non-convex models, we argue that PAC privacy is a natural fit for private prediction. It is instance-based, and calibrates noise to a black-box function's empirical stability to control mutual-information (MI) leakage. The missing ingredient is efficient, adaptive composition. Serving predictions means answering a long stream of adaptively chosen queries from untrusted users; existing composition either fails under adaptivity, grows quadratically, or reverts to input-independent, DP-like noise. We close this gap with a new adversarial composition result via adaptive noise calibration and prove that MI accumulates only linearly under adaptive and adversarial querying. Experiments across modalities show that prediction stability enables high utility even at a tiny per-query budget: on CIFAR-10, we achieve 87.79% accuracy with a per-query MI budget of $2^{-32}$. This enables serving one million queries while provably bounding membership-inference success to 51.08% -- the same guarantee as $(0.04, 10^{-5})$-DP. Further, in the presence of auxiliary public data, the large volume of PAC-private predictions enables us to distill a publishable model that can be queried without limit. Concretely, 210,000 private labels on an ImageNet subset distill into a student reaching 91.86% accuracy on CIFAR-10 with membership inference success bounded by 50.49%, comparable to $(0.02, 10^{-5})$-DP.

2511.07368 2026-06-15 cs.LG cs.AI 版本更新

Distributional Biases in Post-Training: A Markovian Analysis of Reasoning Trajectories

后训练中的分布偏差:推理轨迹的马尔可夫分析

Dake Bu, Wei Huang, Andi Han, Atsushi Nitanda, Bo Xue, Qingfu Zhang, Hau-San Wong, Taiji Suzuki

发表机构 * City University of Hong Kong(香港城市大学) Center for Advanced Intelligence Project, RIKEN(RIKEN高级智能研究中心) The Institute of Statistical Mathematics(统计数学研究所) University of Sydney(悉尼大学) CFAR and IHPC, Agency for Science, Technology and Research (A*STAR)(A*STAR的CFAR和IHPC) Nanyang Technological University(南洋理工大学) The University of Tokyo(东京大学)

AI总结 通过马尔可夫链模型分析后训练策略(如RLVR和ORM/PRM)如何强化高概率路径而遗忘稀有但关键的推理步骤,并证明探索策略(如拒绝简单实例和KL正则化)有助于保留稀有CoT。

详情
AI中文摘要

基础模型展现出广泛的知识但有限的特定任务推理能力,这促使了后训练策略的发展,例如基于可验证奖励的强化学习(RLVR)和测试时扩展(TTS)。尽管近期工作强调了探索在提升pass@K中的作用,但经验证据指向一个悖论:RLVR和ORM/PRM通常强化现有路径而非扩展推理范围,这引发了一个问题:如果没有新模式出现,探索为何有帮助?为调和这一悖论,我们采用Kim等人(2025)的视角,将简单(例如,简化分数)与困难(例如,发现某种对称性)推理步骤分别视为低概率和高概率的马尔可夫转移。在这个易处理的模型中,预训练对应于树图发现,而后训练对应于思维链(CoT)重新加权。我们可证明地表明,RLVR和ORM/PRM都会严重偏向若干高概率路径,从而遗忘稀有但关键的CoT。在此基础上,我们进一步证明,诸如拒绝简单实例和KL正则化等探索策略有助于保留稀有CoT。实证模拟证实了我们的理论结果。

英文摘要

Foundation models exhibit broad knowledge but limited task-specific reasoning, motivating post-training strategies such as RL with verifiable rewards (RLVR) and test-time scaling (TTS). While recent work highlights the role of exploration in improving pass@K, empirical evidence points to a paradox: RLVR and ORM/PRM typically reinforce existing paths rather than expanding the reasoning scope, raising the question of why exploration helps if no new patterns emerge. To reconcile this paradox, we adopt the perspective of Kim et al. (2025), viewing easy (e.g., simplifying a fraction) versus hard (e.g., discovering the some symmetry) reasoning steps as low versus high probability Markov transitions. In this tractable model, pretraining corresponds to tree-graph discovering, while post-training corresponds to CoT reweighting. We provably show that, both RLVR and ORM/PRM would favor heavily to several high-probability paths, and thereby forget rare-but-crucial CoTs. Building on this, we further prove that exploration strategies such as rejecting easy instances and KL regularization help preserve rare CoTs. Empirical simulations corroborate our theoretical results.

2601.04885 2026-06-15 cs.CL cs.AI cs.LG 版本更新

CuMA: Aligning LLMs with Sparse Cultural Values via Demographic-Aware Mixture of Adapters

CuMA: 通过人口统计感知的适配器混合使大语言模型与稀疏文化价值观对齐

Ao Sun, Xiaoyu Wang, Zhe Tan, Yu Li, Jiachen Zhu, Yuheng Jia, Shu Su

发表机构 * Southeast University(东南大学) ByteDance Inc.(字节跳动公司) Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China(新一代人工智能技术及其交叉应用重点实验室(东南大学),中华人民共和国教育部,中国)

AI总结 提出CuMA框架,通过人口统计感知路由将冲突梯度分离到专家子空间,解决密集模型在多文化对齐中的均值崩溃问题,在WorldValuesBench等基准上取得最优性能。

Comments ACL 2026 Main

详情
AI中文摘要

随着大语言模型服务于全球用户,对齐必须从强制执行普遍共识转向尊重文化多元主义。我们证明,密集模型在被迫适应冲突的价值分布时会出现\textbf{均值崩溃},收敛到无法代表不同群体的通用平均值。我们将其归因于\textbf{文化稀疏性},其中梯度干扰阻止密集参数跨越不同的文化模式。为解决此问题,我们提出\textbf{\textsc{CuMA}}(\textbf{Cu}ltural \textbf{M}ixture of \textbf{A}dapters),一个将对齐视为\textbf{条件容量分离}问题的框架。通过引入人口统计感知路由,\textsc{CuMA}内化了一个\textit{潜在文化拓扑},以将冲突梯度明确解耦到专门的专家子空间中。在WorldValuesBench、Community Alignment和PRISM上的广泛评估表明,\textsc{CuMA}达到了最先进的性能,显著优于密集基线和仅语义MoE。关键的是,我们的分析证实\textsc{CuMA}有效缓解了均值崩溃,保留了文化多样性。我们的代码可在该https URL获取。

英文摘要

As Large Language Models (LLMs) serve a global audience, alignment must transition from enforcing universal consensus to respecting cultural pluralism. We demonstrate that dense models, when forced to fit conflicting value distributions, suffer from \textbf{Mean Collapse}, converging to a generic average that fails to represent diverse groups. We attribute this to \textbf{Cultural Sparsity}, where gradient interference prevents dense parameters from spanning distinct cultural modes. To resolve this, we propose \textbf{\textsc{CuMA}} (\textbf{Cu}ltural \textbf{M}ixture of \textbf{A}dapters), a framework that frames alignment as a \textbf{conditional capacity separation} problem. By incorporating demographic-aware routing, \textsc{CuMA} internalizes a \textit{Latent Cultural Topology} to explicitly disentangle conflicting gradients into specialized expert subspaces. Extensive evaluations on WorldValuesBench, Community Alignment, and PRISM demonstrate that \textsc{CuMA} achieves state-of-the-art performance, significantly outperforming both dense baselines and semantic-only MoEs. Crucially, our analysis confirms that \textsc{CuMA} effectively mitigates mean collapse, preserving cultural diversity. Our code is available at https://github.com/Throll/CuMA.

2601.00821 2026-06-15 cs.AI cs.CL cs.IR 版本更新

Verbatim Chunks Beat Extracted Artifacts: A Controlled Ablation of Memory Representations for Long LLM Conversations

逐字块胜过提取的人工制品:长LLM对话中记忆表征的控制消融研究

Tao An

发表机构 * Hawaii Pacific University(夏威夷太平洋大学)

AI总结 通过控制消融实验,发现逐字对话块在长对话记忆检索中比LLM提取的结构化人工制品(事实、决策等)准确率高15.9-22.0点,原因是提取过程丢失了逐字细节,而结构化记忆应作为逐字文本的补充而非替代。

Comments v2: substantially revised -- reframed from a system paper to a controlled ablation study; title and conclusions updated accordingly. 26 pages, 5 figures

详情
AI中文摘要

一类日益增长的对话记忆系统将对话历史压缩为结构化人工制品——提取的事实、决策或事件——其前提是蒸馏后的结构比原始文本检索效果更好。我们通过控制消融实验检验了这一前提:在固定的检索-重排-推理流水线中,仅交换存储的表征——LLM提取的类型化人工制品与逐字对话块——保持模型、检索器、重排器和评判器不变。逐字块在LoCoMo上领先15.9个百分点(43.9% vs. 28.0%),在LongMemEval-S上领先22.0个百分点(67.4% vs. 45.4%);1跳语义图无法弥补差距,五个混淆控制实验重现了该效应。其机制是有损蒸馏:提取丢弃了逐字细节,而块则免费保留;提取人工制品流水线在整体准确率上从未超过朴素RAG。同时,使用近乎逐字、保留来源的单元所取得的积极结果也符合同一解释:检索准确性取决于表征与源文本的偏离程度。对于我们测试的提取设计,结构化记忆应增强逐字文本而非替代它:块∪人工制品联合存储在两个基准上都匹配块,而仅人工制品则丧失优势。代码和数据:此 https URL

英文摘要

A growing class of conversational-memory systems compresses dialogue history into structured artifacts -- extracted facts, decisions, or events -- on the premise that distilled structure retrieves better than raw text. We test this premise with a controlled ablation: within one fixed retrieval-rerank-reasoning pipeline, we swap only the stored representation -- LLM-extracted typed artifacts versus verbatim conversation chunks -- holding the model, retriever, reranker, and judge constant. Verbatim chunks win by 15.9 points on LoCoMo (43.9% vs. 28.0%) and 22.0 points on LongMemEval-S (67.4% vs. 45.4%); a 1-hop semantic graph does not recover the gap, and five confound controls reproduce the effect. The mechanism is lossy distillation: extraction discards verbatim detail that chunks retain for free, and the extracted-artifact pipeline never beats naive RAG in overall accuracy. Concurrent positive results with near-verbatim, provenance-preserving units fit the same account: retrieval accuracy tracks how far the representation departs from the source. For the extraction designs we test, structured memory should augment verbatim text rather than replace it: a chunks $\cup$ artifacts union store matches chunks on both benchmarks while artifacts alone forfeit the gap. Code and data: https://github.com/tao-hpu/cog-canvas

2410.15051 2026-06-15 cs.CL cs.LG 版本更新

Automatic identification of diagnosis from hospital discharge letters via weakly supervised Natural Language Processing

通过弱监督自然语言处理自动识别出院信中的诊断

Vittorio Torri, Elisa Barbieri, Anna Cantarutti, Carlo Giaquinto, Francesca Ieva

发表机构 * University of Bologna(博洛尼亚大学)

AI总结 提出一种弱监督NLP流程,无需文档级标注即可从意大利语出院信中分类诊断,在细支气管炎数据集上达到接近全监督的性能,节省大量人工标注时间。

Comments 61 pages, 9 figures

详情
AI中文摘要

从医院出院信中识别患者诊断对于大规模队列选择和流行病学研究至关重要,但传统的监督方法需要大量手动标注,这对于大型文本数据集通常不切实际。我们提出了一种弱监督自然语言处理(NLP)流程,用于对意大利语出院信进行分类,无需文档级手动标注。该方法提取与诊断相关的句子,使用在意大利医学文档上进一步预训练的Transformer模型生成语义嵌入,并应用两级聚类程序推导出弱标签,然后用于训练文档级分类器。该方法在2017年至2020年间意大利威尼托地区44个急诊室或医院收治的33,176份儿童出院信的细支气管炎案例研究中进行了评估。最佳弱监督模型在手动标注数据上实现了77.68%(±4.30%)的AUROC、73.13%(±4.93%)的AUPRC和78.14%(±4.89%)的F1分数。性能超过了无监督基线,接近全监督模型,同时对于该规模的数据集减少了超过1,500小时的手动标注需求。在较小的支气管炎数据集(3,188份出院信,2020-2025年)的二次验证中观察到类似的模型排名,最佳弱监督模型实现了76.72%(±5.02%)的AUPRC。这些结果表明弱监督NLP方法在从临床出院信中可扩展地识别疾病方面具有潜力。

英文摘要

Identifying patient diagnoses from hospital discharge letters is essential for large-scale cohort selection and epidemiological research, but traditional supervised approaches require extensive manual annotation, which is often impractical for large textual datasets. We present a weakly supervised Natural Language Processing (NLP) pipeline for classifying Italian discharge letters without document-level manual annotation. The method extracts diagnosis-related sentences, generates semantic embeddings using a transformer model further pre-trained on Italian medical documents, and applies a two-level clustering procedure to derive weak labels that are then used to train a document-level classifier. The approach was evaluated in a case study on bronchiolitis using 33,176 discharge letters of children admitted to 44 emergency rooms or hospitals in the Veneto Region, Italy, between 2017 and 2020. The best weakly supervised model achieved an AUROC of 77.68% ($\pm4.30\%$), an AUPRC of 73.13% ($\pm4.93\%$), and an F1-score of 78.14% ($\pm4.89\%$) against manually annotated data. Performance surpassed unsupervised baselines and approached fully supervised models, while reducing the need for manual annotation by more than 1,500 hours for a dataset of this size. Similar model rankings were observed in a secondary validation on a smaller bronchitis dataset (3,188 discharge letters, 2020-2025), where the best weakly supervised model achieved an AUPRC of 76.72% ($\pm 5.02\%$). These results suggest the potential of weakly supervised NLP methods for scalable disease identification from clinical discharge letters.

2509.01455 2026-06-15 cs.CL 版本更新

Trusted Uncertainty in Large Language Models: A Unified Framework for Confidence Calibration and Risk-Controlled Refusal

大型语言模型中的可信不确定性:置信度校准与风险控制拒绝的统一框架

Markus Oehri, Giulia Conti, Kaviraj Pather, Alexandre Rossi, Laia Serra, Adrian Parody, Rogvi Johannesen, Aviaja Petersen, Arben Krasniqi

发表机构 * University of Liechtenstein(列支敦士登大学) University of the Republic of San Marino(圣马尔科共和国大学) University of Mauritius(毛里求斯大学) International University of Monaco(摩纳哥国际大学) University of Andorra(安道尔大学) University of Gibraltar(直布罗陀大学) University of the Faroe Islands(法罗群岛大学) Ilisimatusarfik (University of Greenland)(格陵兰岛研究所(格陵兰大学)) University of Prizren(普里兹伦大学)

AI总结 提出UniCR框架,融合多种不确定性证据为校准的正确概率,并通过原则性拒绝满足用户指定的错误预算,无需微调基础模型,在短问答、代码生成和检索增强长问答中提升校准指标并降低风险-覆盖曲线下面积。

Comments arXiv admin note: This paper has been withdrawn by arXiv due to unverifiable authorship and affiliation

详情
AI中文摘要

部署的语言模型必须决定回答什么以及何时不回答。我们提出UniCR,一个统一框架,将异构的不确定性证据(包括序列似然、自一致性离散度、检索兼容性以及工具或验证器反馈)转化为校准的正确概率,然后通过原则性拒绝强制执行用户指定的错误预算。UniCR学习一个轻量级的校准头,采用温度缩放和适当评分,通过黑盒特征支持仅API模型,并使用共形风险控制提供无分布保证。对于长文本生成,我们通过监督从检索证据中得出的原子事实性分数,将置信度与语义保真度对齐,在保持覆盖的同时减少自信幻觉。在短问答、带执行测试的代码生成和检索增强长问答上的实验显示,与熵或logit阈值、后处理校准器和端到端选择性基线相比,校准指标持续改善,风险-覆盖曲线下面积更低,固定风险下覆盖率更高。分析表明,证据矛盾、语义离散度和工具不一致是弃权的主要驱动因素,产生信息丰富的面向用户的拒绝消息。结果是一种可移植的证据融合到校准概率再到风险控制决策的配方,无需微调基础模型即可提高可信度,并在分布偏移下保持有效。

英文摘要

Deployed language models must decide not only what to answer but also when not to answer. We present UniCR, a unified framework that turns heterogeneous uncertainty evidence including sequence likelihoods, self-consistency dispersion, retrieval compatibility, and tool or verifier feedback into a calibrated probability of correctness and then enforces a user-specified error budget via principled refusal. UniCR learns a lightweight calibration head with temperature scaling and proper scoring, supports API-only models through black-box features, and offers distribution-free guarantees using conformal risk control. For long-form generation, we align confidence with semantic fidelity by supervising on atomic factuality scores derived from retrieved evidence, reducing confident hallucinations while preserving coverage. Experiments on short-form QA, code generation with execution tests, and retrieval-augmented long-form QA show consistent improvements in calibration metrics, lower area under the risk-coverage curve, and higher coverage at fixed risk compared to entropy or logit thresholds, post-hoc calibrators, and end-to-end selective baselines. Analyses reveal that evidence contradiction, semantic dispersion, and tool inconsistency are the dominant drivers of abstention, yielding informative user-facing refusal messages. The result is a portable recipe of evidence fusion to calibrated probability to risk-controlled decision that improves trustworthiness without fine-tuning the base model and remains valid under distribution shift.

2512.22671 2026-06-15 cs.CL cs.AI cs.LG 版本更新

Fragile Knowledge, Robust Instruction-Following: The Width Pruning Dichotomy in Llama-3.2

脆弱的知识,稳健的指令遵循:Llama-3.2中的宽度剪枝二分法

Pere Martra

发表机构 * Independent Researcher(独立研究员)

AI总结 通过峰值幅度准则对GLU-MLP层进行结构化宽度剪枝,发现降低扩展比会损害参数化知识任务,但能提升指令遵循能力,挑战了剪枝导致均匀退化的假设。

Comments 22 pages, 5 figures, 9 tables. Code available at https://github.com/peremartra/llama-glu-expansion-pruning

详情
AI中文摘要

对Llama-3.2模型中GLU-MLP层的结构化宽度剪枝,以峰值幅度(PPM)准则为指导,揭示了降低扩展比如何系统性地影响不同模型能力的二分法。虽然依赖参数化知识的任务(如MMLU、GSM8K)和困惑度指标的性能随扩展比降低而可预测地下降,但指令遵循能力在2.4倍平衡比下得到提升(IFEval:Llama-3.2-1B中+4.8分/+46%,Llama-3.2-3B中+3.7分/+39%),且多步推理保持稳健(MUSR)。这种模式在两个评估模型大小上一致观察到,挑战了压缩研究中剪枝导致均匀退化的主流假设。为探究这一点,我们使用评估事实知识、数学推理、语言理解、指令遵循和真实性的综合基准套件,评估了七种扩展比配置。我们的分析将扩展比识别为一个关键架构参数,它选择性地重塑模型的任务性能轮廓,而不仅仅是作为压缩指标。

英文摘要

Structured width pruning of GLU-MLP layers in Llama-3.2 models, guided by the Peak-to-Peak Magnitude (PPM) criterion, reveals a systematic dichotomy in how reducing the expansion ratio affects different model capabilities. While performance on tasks relying on parametric knowledge (e.g., MMLU, GSM8K) and perplexity metrics degrades predictably with decreasing expansion ratios, instruction-following capabilities improve at the 2.4x equilibrium ratio (IFEval: +4.8 points / +46% in Llama-3.2-1B and +3.7 points / +39% in Llama-3.2-3B), and multi-step reasoning remains robust (MUSR). This pattern, observed consistently across both evaluated model sizes, challenges the prevailing assumption in compression research that pruning induces uniform degradation. To investigate this, we evaluated seven expansion ratio configurations using comprehensive benchmark suites that assess factual knowledge, mathematical reasoning, language comprehension, instruction-following, and truthfulness. Our analysis identifies the expansion ratio as a critical architectural parameter that selectively reshapes the model's task performance profile, rather than merely serving as a compression metric.

2512.22484 2026-06-15 cs.RO math.DG 版本更新

Asymmetric Friction in Geometric Locomotion

几何运动中的非对称摩擦

Ross L. Hatton, Yousef Salaman, Shai Revzen

发表机构 * Robotics program at Oregon State University(俄勒冈州立大学机器人项目) Department of Electrical Engineering and Computer Science at the University of Michigan(密歇根大学电气工程与计算机科学系)

AI总结 本文提出将非对称摩擦引入几何运动模型,用Finsler度量替代Riemannian度量,并扩展子Riemannian方法为子Finsler方法,以表征系统运动能力。

Comments 23 pages, 15 figures

详情
AI中文摘要

运动学的几何力学模型揭示了机器人和动物如何利用环境相互作用将内部形状变化转化为在世界中的位移,并将这种关系编码为“运动图”。这类运动图的一个关键类别源于作用在系统各个身体部位上的(可能是各向异性的)线性阻力,通过系统各个身体部位运动的Riemannian度量形式化描述。然后,可以通过对系统整体运动施加子Riemannian约束来生成运动图,在该约束下,给定形状速度所引起的位置速度是使摩擦耗散功率最小的那个。这类系统的运动是“几何的”,因为系统最终达到的位置仅取决于系统经过的形状序列,而不取决于形状变化的速率。在本文中,我们考虑一类更一般的系统,其中阻力不仅可以是各向异性的(前后和左右运动具有不同的系数),而且可以是非对称的(前后运动具有不同的系数)。形式上,在摩擦中包含非对称性将身体部位的Riemannian度量替换为Finsler度量。我们证明了构建系统运动图的子Riemannian方法自然地扩展到子Finsler方法,并确定了与子Riemannian系统的约束曲率类似的系统属性,从而能够表征系统的运动能力。

英文摘要

Geometric mechanics models of locomotion have provided insight into how robots and animals use environmental interactions to convert internal shape changes into displacement through the world, encoding this relationship in a ``motility map''. A key class of such motility maps arises from (possibly anisotropic) linear drag acting on the system's individual body parts, formally described via Riemannian metrics on the motions of the system's individual body parts. The motility map can then be generated by invoking a sub-Riemannian constraint on the aggregate system motion under which the position velocity induced by a given shape velocity is that which minimizes the power dissipated via friction. The locomotion of such systems is ``geometric'' in the sense that the final position reached by the system depends only on the sequence of shapes that the system passes through, but not on the rate with which the shape changes are made. In this paper, we consider a far more general class of systems in which the drag may be not only anisotropic (with different coefficients for forward/backward and left/right motions), but also asymmetric (with different coefficients for forward and backward motions). Formally, including asymmetry in the friction replaces the Riemannian metrics on the body parts with Finsler metrics. We demonstrate that the sub-Riemannian approach to constructing the system motility map extends naturally to a sub-Finslerian approach and identify system properties analogous to the constraint curvature of sub-Riemannian systems that allow for the characterization of the system motion capabilities.

2512.14967 2026-06-15 cs.LG q-fin.CP q-fin.MF 版本更新

Deep Learning and Elicitability for McKean-Vlasov FBSDEs With Common Noise

带共同噪声的McKean-Vlasov正倒向随机微分方程的深度学习与可引性

Felipe J. P. Antunes, Yuri F. Saporito, Sebastian Jaimungal

发表机构 * School of Applied Mathematics, Getulio Vargas Foundation(应用数学学院,古特雷斯基金会) Department of Statistical Sciences, University of Toronto(统计科学系,多伦多大学) Oxford-Man Institute for Quantitative Finance, University of Oxford(牛津-曼定量金融研究所,牛津大学)

AI总结 提出结合Picard迭代、可引性和深度学习的方法,求解带共同噪声的McKean-Vlasov正倒向随机微分方程,通过可引性导出路径损失函数避免嵌套蒙特卡洛,在系统风险模型和经济增长模型中验证了准确性。

Comments 19 pages, 8 figures,

详情
AI中文摘要

我们提出了一种新颖的数值方法,用于求解带共同噪声的McKean-Vlasov正倒向随机微分方程(MV-FBSDEs),该方法结合了Picard迭代、可引性和深度学习。关键创新在于利用可引性导出路径损失函数,从而能够高效训练神经网络来近似倒向过程和由共同噪声引起的条件期望,无需计算昂贵的嵌套蒙特卡洛模拟。平均场相互作用项通过循环神经网络参数化,该网络被训练以最小化可引分数,而倒向过程则通过表示解耦场的混合前馈和循环网络来近似。我们在一个存在解析解的系统性风险银行间借贷模型上验证了该算法,结果表明能够准确恢复真实解。我们进一步将模型扩展到分位数中介的相互作用,展示了可引性框架在条件均值或矩之外的灵活性。最后,我们将该方法应用于一个具有内生利率的非平稳Aiyagari-Bewley-Huggett经济增长模型,展示了其在没有闭式解的复杂平均场博弈中的适用性。

英文摘要

We present a novel numerical method for solving McKean--Vlasov forward--backward stochastic differential equations (MV--FBSDEs) with common noise, combining Picard iterations, elicitability and deep learning. The key innovation involves elicitability to derive a pathwise loss function, enabling efficient training of neural networks to approximate both the backward process and the conditional expectations arising from common noise, without requiring computationally expensive nested Monte Carlo simulations. The mean-field interaction term is parameterized via a recurrent neural network trained to minimize an elicitable score, while the backward process is approximated through a hybrid feedforward and recurrent network representing the decoupling field. We validate the algorithm on a systemic-risk inter-bank borrowing and lending model, where analytical solutions exist, demonstrating accurate recovery of the true solution. We further extend the model to quantile-mediated interactions, showcasing the flexibility of the elicitability framework beyond conditional means or moments. Finally, we apply the method to a non-stationary Aiyagari--Bewley--Huggett economic growth model with endogenous interest rates, illustrating its applicability to complex mean-field games without closed-form solutions.

2512.14366 2026-06-15 cs.CV 版本更新

Optimizing Rank for High-Fidelity Implicit Neural Representations

优化秩以实现高保真隐式神经表示

Julian McGinnis, Florian A. Hölzl, Suprosanna Shit, Florentin Bieder, Paul Friedrich, Mark Mühlau, Bjoern Menze, Daniel Rueckert, Benedikt Wiestler

发表机构 * University of Cambridge(剑桥大学)

AI总结 本文通过训练中稳定秩的调控,证明简单MLP也能实现高保真隐式神经表示,使用Muon优化器可显著提升性能,在多种任务上PSNR提升高达9 dB。

详情
AI中文摘要

基于普通多层感知器(MLP)的隐式神经表示(INR)被广泛认为无法表示高频内容。这促使研究转向架构干预,如坐标嵌入或专用激活函数,以表示高频信号。在本文中,我们挑战了普通MLP的低频偏差是学习高频内容的内在架构限制这一观点,而是认为这是训练过程中稳定秩退化的症状。我们通过实验证明,在训练过程中调控网络的秩可以显著提高学习信号的保真度,甚至使简单的MLP架构也具有表现力。大量实验表明,使用像Muon这样具有高秩、近正交更新的优化器,能够持续增强INR架构,甚至超越简单的ReLU MLP。这些显著改进适用于多种领域,包括自然图像、医学图像和新视角合成,在相同架构下PSNR提升高达9 dB。代码可在(https://rank-inrs.github.io)获取。

英文摘要

Implicit Neural Representations (INRs) based on vanilla Multi-Layer Perceptrons (MLPs) are widely believed to be incapable of representing high-frequency content. This has directed research efforts towards architectural interventions, such as coordinate embeddings or specialized activation functions, to represent high-frequency signals. In this paper, we challenge the notion that the low-frequency bias of vanilla MLPs is an intrinsic, architectural limitation to learn high-frequency content, but instead a symptom of stable rank degradation during training. We empirically demonstrate that regulating the network's rank during training substantially improves the fidelity of the learned signal, rendering even simple MLP architectures expressive. Extensive experiments show that using optimizers like Muon, with high-rank, near-orthogonal updates, consistently enhances INR architectures even beyond simple ReLU MLPs. These substantial improvements hold across a diverse range of domains, including natural and medical images and novel view synthesis, with up to +9 dB PSNR over the same architecture. Code is available at (https://rank-inrs.github.io).

2512.13069 2026-06-15 cs.LG physics.flu-dyn stat.ML 版本更新

Multi-fidelity aerodynamic data fusion by autoencoder transfer learning

基于自编码器迁移学习的多保真度气动数据融合

Javier Nieto-Centenero, Esther Andrés, Rodrigo Castellanos

发表机构 * Department of Aerospace Engineering, UC3M(航空航天工程系,UC3M) Theoretical and Computational Aerodynamics Group, Flight Physics Department, INTA(理论与计算空气动力学组,飞行物理部门,INTA)

AI总结 提出结合自编码器迁移学习与多分裂保形预测的多保真度深度学习框架,利用低保真数据学习潜在物理表示,微调解码器以极少量高保真数据实现高精度气动压力预测,并生成超过95%点覆盖的不确定度带。

Comments 27 pages, 13 figures

详情
AI中文摘要

准确的气动预测通常依赖于高保真度模拟;然而,其高昂的计算成本严重限制了其在数据驱动建模中的适用性。这一局限性促使了多保真度策略的发展,该策略利用廉价的低保真度信息而不牺牲准确性。针对这一挑战,本文提出了一种多保真度深度学习框架,该框架将基于自编码器的迁移学习与新开发的多分裂保形预测(MSCP)策略相结合,以在极端数据稀缺条件下实现具有不确定度感知的气动数据融合。该方法利用丰富的低保真度(LF)数据学习紧凑的潜在物理表示,该表示作为冻结的知识库,随后使用稀缺的高保真度(HF)样本对解码器进行微调。在NACA翼型(二维)和跨声速机翼(三维)数据库的表面压力分布测试中,该模型成功修正了LF偏差,并使用最少的HF训练数据实现了高精度的压力预测。此外,MSCP框架生成了稳健且可操作的不确定度带,点覆盖超过95%。通过将极端数据效率与不确定度量化相结合,本文为数据稀缺环境下的气动回归提供了一种可扩展且可靠的解决方案。

英文摘要

Accurate aerodynamic prediction often relies on high-fidelity simulations; however, their prohibitive computational costs severely limit their applicability in data-driven modeling. This limitation motivates the development of multi-fidelity strategies that leverage inexpensive low-fidelity information without compromising accuracy. Addressing this challenge, this work presents a multi-fidelity deep learning framework that combines autoencoder-based transfer learning with a newly developed Multi-Split Conformal Prediction (MSCP) strategy to achieve uncertainty-aware aerodynamic data fusion under extreme data scarcity. The methodology leverages abundant Low-Fidelity (LF) data to learn a compact latent physics representation, which acts as a frozen knowledge base for a decoder that is subsequently fine-tuned using scarce HF samples. Tested on surface-pressure distributions for NACA airfoils (2D) and a transonic wing (3D) databases, the model successfully corrects LF deviations and achieves high-accuracy pressure predictions using minimal HF training data. Furthermore, the MSCP framework produces robust, actionable uncertainty bands with pointwise coverage exceeding 95%. By combining extreme data efficiency with uncertainty quantification, this work offers a scalable and reliable solution for aerodynamic regression in data-scarce environments.

2512.05025 2026-06-15 cs.CV 版本更新

RAMEN: Resolution-Adjustable Multimodal Encoder for Earth Observation

RAMEN: 面向地球观测的分辨率可调多模态编码器

Nicolas Houdré, Diego Marcos, Hugo Riffaud de Turckheim, Dino Ienco, Laurent Wendling, Camille Kurtz, Sylvain Lobry

发表机构 * Institut National des Sciences de l'Univers (INSU), France(法国国家科学研究院) CNRS, France(法国国家科学研究中心)

AI总结 提出RAMEN,一种传感器无关的分辨率可调多模态编码器,通过将分辨率作为可控参数,在统一潜空间中实现多模态地球观测数据的连贯分析,并在PANGAEA基准上优于现有模型。

Journal ref CVPR 2026

详情
AI中文摘要

地球观测(EO)数据涵盖广泛的空间、光谱和时间分辨率,从高分辨率光学图像到低分辨率多光谱产品或雷达时间序列。虽然最近的基座模型改进了多模态集成以学习有意义的表示,但它们通常期望固定的输入分辨率或基于传感器特定的编码器,限制了跨异构EO模态的泛化。为克服这些限制,我们引入了RAMEN,一种分辨率可调的多模态编码器,以完全传感器无关的方式学习跨EO数据的共享视觉表示。RAMEN将模态、空间和时间分辨率视为关键输入数据特征,从而在统一潜空间内实现跨模态的连贯分析。其主要方法贡献是将空间分辨率定义为可控输出参数,使用户在推理时能够直接控制所需的细节水平,并允许在空间精度和计算成本之间进行显式权衡。我们训练了一个统一的Transformer编码器,用于重构来自不同来源的掩蔽多模态EO数据,确保跨传感器和分辨率的泛化。预训练后,RAMEN有效地迁移到已知和未见过的传感器配置,并在社区标准的PANGAEA基准上优于更大的最先进模型,该基准包含多种多传感器和多分辨率下游任务。我们的代码和预训练模型可在以下网址获取:https://this URL。

英文摘要

Earth observation (EO) data spans a wide range of spatial, spectral, and temporal resolutions, from high-resolution optical imagery to low resolution multispectral products or radar time series. While recent foundation models have improved multimodal integration for learning meaningful representations, they often expect fixed input resolutions or are based on sensor-specific encoders limiting generalization across heterogeneous EO modalities. To overcome these limitations we introduce RAMEN, a resolution-adjustable multimodal encoder that learns a shared visual representation across EO data in a fully sensor-agnostic manner. RAMEN treats the modality and spatial and temporal resolutions as key input data features, enabling coherent analysis across modalities within a unified latent space. Its main methodological contribution is to define spatial resolution as a controllable output parameter, giving users direct control over the desired level of detail at inference and allowing explicit trade-offs between spatial precision and computational cost. We train a single, unified transformer encoder reconstructing masked multimodal EO data drawn from diverse sources, ensuring generalization across sensors and resolutions. Once pretrained, RAMEN transfers effectively to both known and unseen sensor configurations and outperforms larger state-of-the-art models on the community-standard PANGAEA benchmark, containing various multi-sensor and multi-resolution downstream tasks. Our code and pretrained model are available at https://github.com/nicolashoudre/RAMEN.

2512.04981 2026-06-15 cs.CV cs.LG 版本更新

Aligned but Stereotypical? How System Prompts Shape Demographic Bias in LLM-Based Text-to-Image Models

对齐但刻板?系统提示如何塑造基于LLM的文本到图像模型中的人口统计偏见

NaHyeon Park, Na Min An, Kunhee Kim, Soyeon Yoon, Jiahao Huo, Hyunjung Shim

发表机构 * KAIST(韩国科学技术院) HKUST (GZ)(香港科技大学(广州))

AI总结 研究LLM增强的文本到图像系统在提示扩展中引入隐性人口统计偏见的问题,提出无训练的去偏框架FairPro,通过自适应生成公平性指令减少人口统计差异。

Comments Project page: https://fairpro-t2i.github.io

详情
AI中文摘要

文本到图像(T2I)系统越来越依赖基于大语言模型(LLM)的文本条件来解释和扩展用户提示。虽然这提高了提示理解和文本-图像对齐,但我们发现,即使未指定人口统计属性,它也可能引入隐性的人口统计假设。为了系统地研究这种行为在不同提示模糊性和复杂性水平下的表现,我们构建了一个涵盖多种提示设置的综合基准。对八个最新T2I模型的评估表明,基于LLM的系统始终比非LLM基线表现出更强的人口统计偏差。我们进一步分析了系统提示,这是基于LLM的T2I系统特有的组件,用于指导提示解释和扩展。我们的分析表明,这些指令强烈影响文本嵌入,进而导致有偏的图像生成。受这些发现启发,我们提出了FairPro,一个无训练的去偏框架,它在保持用户意图的同时自适应地生成公平性感知指令。实验表明,FairPro在保持提示忠实度的同时显著减少了人口统计差异。

英文摘要

Text-to-image (T2I) systems increasingly rely on Large Language Model (LLM)-based text conditioning to interpret and expand user prompts. While this improves prompt understanding and text-image alignment, we find that it can also introduce implicit demographic assumptions, even when demographic attributes are unspecified. To systematically investigate this behavior across varying levels of prompt ambiguity and complexity, we construct a comprehensive benchmark covering diverse prompt settings. Evaluations on eight recent T2I models show that LLM-based systems consistently exhibit stronger demographic skew than non-LLM-based baselines. We further analyze system prompts, a component unique to LLM-based T2I systems that guides prompt interpretation and expansion. Our analyses show that these instructions strongly influence text embeddings, which subsequently leads to biased image generations. Motivated by these findings, we propose FairPro, a training-free debiasing framework that adaptively generates fairness-aware instructions while preserving user intent. Experiments demonstrate that FairPro substantially reduces demographic disparities while maintaining prompt fidelity.

2512.03787 2026-06-15 cs.LG 版本更新

Adaptive Identification and Modeling of Clinical Pathways with Process Mining

基于过程挖掘的临床路径自适应识别与建模

Francesco Vitale, Nicola Mazzocca

发表机构 * University of Naples Federico II(那不勒斯费德里科二世大学)

AI总结 提出一种两阶段过程挖掘方法,通过一致性检查诊断扩展临床路径知识库,实现自适应识别与建模,在Synthea数据集上达到95.62% AUC和67.11%弧阶简单性。

Comments Accepted to the 41st ACM/SIGAPP Symposium On Applied Computing (ACM SAC 2026)

详情
AI中文摘要

临床路径是模拟患者治疗过程的专门医疗计划。它们旨在提供基于标准的进展并标准化患者治疗,从而改善护理、减少资源使用并加速患者康复。然而,基于临床指南和领域专业知识手动建模这些路径是困难的,并且可能无法反映针对不同疾病变异或组合的实际最佳实践。我们提出了一种使用过程挖掘的两阶段建模方法,通过利用一致性检查诊断来扩展临床路径知识库。在第一阶段,收集给定疾病的历史数据,以过程模型的形式捕获治疗。在第二阶段,将新数据与参考模型进行比较以验证一致性。基于一致性检查结果,知识库可以扩展为针对新变异或疾病组合定制的更具体模型。我们使用Synthea(一个模拟SARS-CoV-2感染患者治疗并伴有不同COVID-19并发症的基准数据集)展示了我们的方法。结果表明,我们的方法能够以足够的精度扩展临床路径知识库,AUC峰值达到95.62%,同时保持67.11%的弧阶简单性。

英文摘要

Clinical pathways are specialized healthcare plans that model patient treatment procedures. They are developed to provide criteria-based progression and standardize patient treatment, thereby improving care, reducing resource use, and accelerating patient recovery. However, manual modeling of these pathways based on clinical guidelines and domain expertise is difficult and may not reflect the actual best practices for different variations or combinations of diseases. We propose a two-phase modeling method using process mining, which extends the knowledge base of clinical pathways by leveraging conformance checking diagnostics. In the first phase, historical data of a given disease is collected to capture treatment in the form of a process model. In the second phase, new data is compared against the reference model to verify conformance. Based on the conformance checking results, the knowledge base can be expanded with more specific models tailored to new variants or disease combinations. We demonstrate our approach using Synthea, a benchmark dataset simulating patient treatments for SARS-CoV-2 infections with varying COVID-19 complications. The results show that our method enables expanding the knowledge base of clinical pathways with sufficient precision, peaking to 95.62% AUC while maintaining an arc-degree simplicity of 67.11%.

2501.08561 2026-06-15 cs.AI cs.HC cs.LG cs.SC 版本更新

ANSR-DT: A Neuro-Symbolic Framework for Adaptive and Explainable Digital Twins

ANSR-DT:一种自适应可解释数字孪生的神经符号框架

Safayat Bin Hakim, Muhammad Adil, Alvaro Velasquez, Houbing Herbert Song

发表机构 * Department of Information Systems, University of Maryland Baltimore County(信息系统系,马里兰大学巴尔的摩县分校) Department of Computer Science and Engineering, University at Buffalo(计算机科学与工程系,布法罗大学) Department of Computer Science, University of Colorado Boulder(计算机科学系,科罗拉多大学波德分校)

AI总结 提出ANSR-DT框架,结合CNN-LSTM、Prolog推理和PPO强化学习,实现数字孪生的异常检测、符号推理与自适应决策,在多个基准上表现优异。

Comments Code available at https://github.com/sbhakim/ansr-dt

详情
AI中文摘要

数字孪生越来越多地用于监控和优化工业系统,然而许多现有框架仍然难以解释、适应缓慢,并且整合显式领域知识的能力有限。本文提出了ANSR-DT,一种自适应神经符号框架,它在单一数字孪生流水线中统一了时序异常检测、符号推理和基于强化学习的决策支持。ANSR-DT将用于多变量模式识别的CNN-LSTM模型与基于Prolog的推理相结合,后者将学习到的信号转换为显式规则,从而实现透明的诊断和可追溯的决策路径。基于PPO的适应层进一步在变化条件下优化操作响应,同时保持可解释性。在8个基线模型上的实验表明,ANSR-DT在提供竞争性预测性能的同时,还能实现稳定的规则提取、可扩展的符号推理和可操作的解释。在Skoltech异常基准(SKAB)上的额外验证进一步表明,该框架能够迁移到合成场景之外。这些发现使ANSR-DT成为可信、自适应和可解释的工业数字孪生的实用基础。

英文摘要

Digital twins are increasingly used to monitor and optimize industrial systems, yet many existing frameworks remain difficult to interpret, slow to adapt, and limited in their ability to incorporate explicit domain knowledge. This paper presents ANSR-DT, an adaptive neuro-symbolic framework that unifies temporal anomaly detection, symbolic reasoning, and reinforcement-learning-based decision support within a single digital twin pipeline. ANSR-DT combines a CNN-LSTM model for multivariate pattern recognition with Prolog-based reasoning that converts learned signals into explicit rules, enabling transparent diagnoses and traceable decision paths. A PPO-based adaptation layer further refines operational responses under changing conditions while preserving interpretability. Experiments against 8 baselines show that ANSR-DT delivers competitive predictive performance together with stable rule extraction, scalable symbolic reasoning, and actionable explanations. Additional validation on the Skoltech Anomaly Benchmark (SKAB) further indicates that the framework transfers beyond synthetic settings. These findings position ANSR-DT as a practical foundation for trustworthy, adaptive, and explainable industrial digital twins.

2511.19656 2026-06-15 cs.LG math.OC stat.ML 版本更新

Lower Complexity Bounds for Nonconvex-Strongly-Convex Bilevel Optimization with First-Order Oracles

非凸-强凸双层优化的一阶Oracle下界复杂度

Kaiyi Ji

发表机构 * Kaiyi Ji(机凯毅)

AI总结 针对光滑非凸-强凸双层优化,在确定性和随机一阶Oracle模型下,分别证明了$\Omega(\kappa^{3/2}\epsilon^{-2})$和$\Omega(\kappa^{5/2}\epsilon^{-4})$的下界,改进了单层非凸优化和极小极大问题的已知最优下界。

Comments Accepted by ICML 2026

详情
AI中文摘要

尽管双层优化的上界保证已被广泛研究,但由于双层结构的复杂性,下界方面的进展有限。本文关注光滑非凸-强凸设定,并开发了新的困难实例,在确定性和随机一阶Oracle模型下得到了非平凡的下界。在确定性情形下,我们证明任何一阶零尊重算法至少需要$\Omega(\kappa^{3/2}\epsilon^{-2})$次Oracle调用才能找到$\epsilon$-精确的稳定点,改进了单层非凸优化和非凸-强凸极小极大问题已知的最优下界。在随机情形下,我们证明至少需要$\Omega(\kappa^{5/2}\epsilon^{-4})$次随机Oracle调用,同样强化了相关设定中的已知最优下界。我们的结果揭示了当前双层优化上下界之间的显著差距,并表明即使在简化设定(如二次下层目标)下,仍需进一步研究以理解标准一阶Oracle下双层优化的最优复杂度。

英文摘要

Although upper bound guarantees for bilevel optimization have been widely studied, progress on lower bounds has been limited due to the complexity of the bilevel structure. In this work, we focus on the smooth nonconvex-strongly-convex setting and develop new hard instances that yield nontrivial lower bounds under deterministic and stochastic first-order oracle models. In the deterministic case, we prove that any first-order zero-respecting algorithm requires at least $Ω(κ^{3/2}ε^{-2})$ oracle calls to find an $ε$-accurate stationary point, improving the optimal lower bounds known for single-level nonconvex optimization and for nonconvex-strongly-convex min-max problems. In the stochastic case, we show that at least $Ω(κ^{5/2}ε^{-4})$ stochastic oracle calls are necessary, again strengthening the best known bounds in related settings. Our results expose substantial gaps between current upper and lower bounds for bilevel optimization and suggest that even simplified regimes, such as those with quadratic lower-level objectives, warrant further investigation toward understanding the optimal complexity of bilevel optimization under standard first-order oracles.

2511.14897 2026-06-15 cs.CV cs.LG 版本更新

HULFSynth : An INR based Super-Resolution and Ultra Low-Field MRI Synthesis via Contrast factor estimation

HULFSynth: 基于隐式神经表示的超分辨率和超低场MRI合成,通过对比因子估计

Pranav Indrakanti, Luca Trautmann, Ivor Simpson

发表机构 * LILI Lab, University of Sussex, Brighton, UK(利利实验室,苏塞克斯大学,布里斯托尔,英国)

AI总结 提出无监督单图像双向MRI合成器,基于物理模型估计组织类型信噪比实现高低场转换,并利用隐式神经表示网络实现超分辨率,在合成和真实数据上验证了对比度提升。

Comments Medical Image Understanding and Analysis, MIUA 2026

详情
AI中文摘要

我们提出了一种无监督的单图像双向磁共振图像(MRI)合成器,它可以从高场(HF)幅度图像合成类似超低场(ULF)的图像,反之亦然。与现有的MRI合成模型不同,我们的方法受驱动HF和ULF MRI之间对比度变化的物理原理启发。我们的前向模型通过基于目标对比度值估计组织类型信噪比(SNR)值来模拟HF到ULF的变换。对于超分辨率任务,我们使用隐式神经表示(INR)网络,通过同时预测组织类型分割和图像强度来合成HF图像,而无需观察到的HF数据。所提出的方法使用从标准3T T1加权图像生成的合成ULF样数据进行定性评估,并使用配对的3T-64mT T1加权图像进行验证实验。在合成ULF样图像中,白质-灰质对比度提高了52%,在64mT图像中提高了37%。敏感性实验证明了我们的前向模型对目标对比度、噪声和初始种子的变化的鲁棒性。

英文摘要

We present an unsupervised single image bidirectional Magnetic Resonance Image (MRI) synthesizer that synthesizes an Ultra-Low Field (ULF) like image from a High-Field (HF) magnitude image and vice-versa. Unlike existing MRI synthesis models, our approach is inspired by the physics that drives contrast changes between HF and ULF MRIs. Our forward model simulates a HF to ULF transformation by estimating the tissue-type Signal-to-Noise ratio (SNR) values based on target contrast values. For the Super-Resolution task, we used an Implicit Neural Representation (INR) network to synthesize HF image by simultaneously predicting tissue-type segmentations and image intensity without observed HF data. The proposed method is evaluated using synthetic ULF-like data from generated from standard 3T T$_1$-weighted images for qualitative assessments and paired 3T-64mT T$_1$-weighted images for validation experiments. WM-GM contrast improved by 52% in synthetic ULF-like images and 37% in 64mT images. Sensitivity experiments demonstrated the robustness of our forward model to variations in target contrast, noise and initial seeding.

2511.12193 2026-06-15 cs.CV 版本更新

MMRINet: Efficient Mamba-Based Segmentation with Dual-Path Refinement for Low-Resource MRI Analysis

MMRINet: 基于Mamba的高效双路径细化分割网络用于低资源MRI分析

Abdelrahman Elsayed, Ahmed Jaheen, Mohammad Yaqub

发表机构 * Mohamed bin Zayed University of Artificial Intelligence(穆罕默德·本·扎耶德人工智能大学) New York University(纽约大学)

AI总结 提出MMRINet,一种基于Mamba的轻量级分割网络,通过双路径特征细化和渐进特征聚合,在低资源MRI分析中以2.5M参数实现高效分割,在SSA数据集上优于UNETR等基线。

Comments Accepted at The Medical Image Understanding and Analysis Conference (MIUA 2026)

详情
AI中文摘要

在多参数MRI中自动分割脑肿瘤在资源受限的临床环境中仍然是一个关键但未得到充分解决的挑战,因为需要高端GPU的深度3D网络不可行。这在撒哈拉以南非洲(SSA)尤为突出,低场扫描仪、异质患者群体和严重的数据稀缺加剧了应用标准深度学习管道的难度。我们提出了MMRINet,一种专为这些约束条件设计的轻量级分割架构。其核心是用线性复杂度的Mamba状态空间模型替代二次复杂度的自注意力,从而在不增加基于Transformer方法的计算开销的情况下实现高效的长程体积上下文建模。我们结合了两个轻量级细化组件:双路径特征细化(DPFR),提取互补的细节和上下文表示以改善有限数据下的特征多样性;以及渐进特征聚合(PFA),分层融合多尺度解码器输出以获得更清晰的分割边界。在包含来自尼日利亚临床站点的3D MRI扫描的BraTS-Lighthouse SSA 2025挑战数据集上评估,MMRINet仅用约2.5M参数就实现了平均Dice分数0.752和平均HD95 12.23 mm,优于所有评估的基线,包括UNETR、Swin-UNETR、SegMamba和SegResNet3D。这些结果表明,可以在大幅减少计算的情况下实现强大的验证集分割性能,为低资源临床环境中AI辅助神经肿瘤学提供了实用的一步。我们的GitHub仓库可在此访问:BioMedIA-MBZUAI/MMRINet。

英文摘要

Automated brain tumor segmentation in multi-parametric MRI remains a critical yet underserved challenge in resource-constrained clinical settings, where deep 3D networks requiring high-end GPUs are not viable. This is particularly acute across sub-Saharan Africa (SSA), where low-field scanners, heterogeneous patient demographics, and severe data scarcity compound the difficulty of applying standard deep learning pipelines. We present MMRINet, a lightweight segmentation architecture purpose-built for these constraints. At its core, MMRINet replaces quadratic-complexity self-attention with linear-complexity Mamba state-space models, enabling efficient long-range volumetric context modeling without the computational overhead of Transformer-based approaches. We combine two lightweight refinement components:Dual-Path Feature Refinement (DPFR), which extracts complementary detail and contextual representations to improve feature diversity under limited data, and Progressive Feature Aggregation (PFA), which hierarchically fuses multi-scale decoder outputs for sharper segmentation boundaries. Evaluated on the BraTS-Lighthouse SSA 2025 challenge dataset, comprising 3D MRI scans from Nigerian clinical sites, MMRINet achieves an average Dice score of 0.752 and an average HD95 of 12.23 mm with only ~2.5M parameters, outperforming all evaluated baselines, including UNETR, Swin-UNETR, SegMamba, and SegResNet3D. These results indicate that strong validation-set segmentation performance can be achieved with substantially reduced computation, offering a practical step toward AI-assisted neuro-oncology in low-resource clinical environments. Our GitHub repository can be accessed here: BioMedIA-MBZUAI/MMRINet.

2507.10834 2026-06-15 cs.LG 版本更新

From Small to Large: A Graph Convolutional Network Approach for Solving Assortment Optimization Problems

从小到大:一种用于解决分类优化问题的图卷积网络方法

Guokai Li, Pin Gao, Stefanus Jasin, Zizhuo Wang

发表机构 * Smith School of Business, Queen’s University(女王大学商学院) School of Data Science, The Chinese University of Hong Kong, Shenzhen(香港中文大学(深圳)数据科学学院) Stephen M. Ross School of Business, University of Michigan(密歇根大学罗斯商学院)

AI总结 提出图卷积网络(GCN)框架高效求解约束分类优化问题,通过图表示学习参数到最优分类的映射,小样本训练可泛化至大规模问题,数值实验显示20产品训练模型在2000产品问题上达到85%以上最优收益。

详情
AI中文摘要

分类优化旨在从可替代产品中选择一个子集,在约束条件下最大化期望收益。由于组合和非线性性质,该问题是NP难的,并且在电子商务等行业中频繁出现,平台每分钟需要解决数千个此类问题。我们提出了一种图卷积网络(GCN)框架来高效求解约束分类优化问题。我们的方法构建问题的图表示,训练GCN学习从问题参数到最优分类的映射,并基于GCN的输出开发了三种推理策略。由于GCN能够跨实例规模泛化,从小规模样本中学到的模式可以迁移到大规模问题。我们建立了理论结果来证明所提出的GCN的表达能力,并解释了规模泛化能力的潜在机制。数值实验表明,在20个产品实例上训练的GCN能够在几秒内对多达2000个产品的问题实现超过85%的最优收益,在准确性和效率上均优于现有启发式方法。我们进一步将该框架扩展到使用交易数据的未知选择模型设置,并展示了类似的性能和可扩展性。

英文摘要

Assortment optimization seeks to select a subset of substitutable products, subject to constraints, to maximize expected revenue. The problem is NP-hard due to its combinatorial and nonlinear nature and arises frequently in industries such as e-commerce, where platforms must solve thousands of such problems each minute. We propose a graph convolutional network (GCN) framework to efficiently solve constrained assortment optimization problems. Our approach constructs a graph representation of the problem, trains a GCN to learn the mapping from problem parameters to optimal assortments, and develops three inference policies based on the GCN's output. Owing to the GCN's ability to generalize across instance sizes, patterns learned from small-scale samples can be transferred to large-scale problems. Theoretical results are established to show the expressive power of the proposed GCN, and explain the underlying mechanism of the size generalization ability. Numerical experiments show that a GCN trained on instances with 20 products achieves over 85% of the optimal revenue on problems with up to 2,000 products within seconds, outperforming existing heuristics in both accuracy and efficiency. We further extend the framework to settings with an unknown choice model using transaction data and demonstrate similar performance and scalability.

2511.09789 2026-06-15 cs.LG 版本更新

Trend-Aware Multi-Task Learning for Short-Term Energy Forecasting

CaReTS:统一分类与回归的多任务时间序列预测框架

Fulong Yao, Wanqing Zhao, Chao Zheng, Xiaofei Han

发表机构 * Cardiff University(卡迪夫大学) Newcastle University(纽卡斯尔大学) University of Leeds(利兹大学)

AI总结 提出CaReTS多任务框架,通过双流架构联合分类趋势与回归偏差,实现高精度预测与可解释性,在真实数据集上优于现有方法。

详情
AI中文摘要

近年来深度预测模型取得了显著性能,但大多数方法仍难以同时提供准确的预测和对时间动态的可解释洞察。本文提出CaReTS,一种新颖的多任务学习框架,结合分类和回归任务用于多步时间序列预测问题。该框架采用双流架构,其中分类分支学习未来的逐步趋势,而回归分支估计目标变量最新观测值的相应偏差。双流设计通过分离目标变量的宏观趋势和微观偏差,提供更具可解释性的预测。为了在输出预测、偏差估计和趋势分类中实现有效学习,我们设计了一个具有不确定性加权机制的多任务损失,以自适应平衡每个任务的贡献。此外,在该框架下实例化了四种变体(CaReTS1-4),以集成主流时序建模编码器,包括卷积神经网络(CNN)、长短期记忆网络(LSTM)和Transformer。在真实数据集上的实验表明,CaReTS在预测准确性上优于最先进的算法,同时实现了更高的趋势分类性能。

英文摘要

Short-term energy forecasting plays an important role in real-time operational decision-making, such as electricity market bidding and power system dispatch, where both numerical accuracy and correct directional signals are essential. However, most existing forecasting approaches formulate the problem purely as a regression task, limiting their ability to explicitly capture stepwise directional movements and trend consistency required for operational decisions. To address this limitation, this paper proposes a trend-aware multi-task forecasting framework that decomposes forecasting outputs into directional movements and deviation magnitudes relative to the latest observation, enabling both accurate numerical prediction and interpretable trend-aware outputs. The framework adopts a task-specific dual-stream architecture and explores key design choices for integrating trend and deviation information, including hard versus probabilistic trend representations, symmetric versus asymmetric deviation modelling, and parallel versus sequential conditioning strategies. To stabilize multi-task learning and reduce manual tuning, an uncertainty-aware task weighting scheme is incorporated to automatically balance directional classification, deviation regression, and final output prediction during training. Experimental results on real-world energy datasets demonstrate that the proposed framework achieves competitive numerical accuracy compared with state-of-the-art algorithms, while consistently improving trend prediction performance with moderate computational cost. This capability is particularly beneficial in short-term energy system management, where consistent directional forecasting can provide more reliable decision support for practical operational scenarios such as market bidding, resource scheduling, and risk-aware energy management.

2511.05017 2026-06-15 cs.CV cs.CL 版本更新

Towards Mitigating Hallucinations in Large Vision-Language Models by Refining Textual Embeddings

通过细化文本嵌入缓解大型视觉语言模型中的幻觉

Aakriti Agrawal, Gouthaman KV, Rohith Aralikatti, Gauri Jagatap, Jiaxin Yuan, Sarvesh Baskar, Vijay Kamarshi, Andrea Fanelli, Furong Huang

发表机构 * University of Maryland(马里兰大学) Dolby Laboratories(杜比实验室) Capital One

AI总结 针对大型视觉语言模型因过度依赖文本先验而忽视视觉线索导致的幻觉问题,提出一种简单有效的视觉特征融入方法,通过学习视觉信息化的文本嵌入来平衡注意力分布,显著降低幻觉并提升多模态推理能力。

Comments Accepted at The 64th Annual Meeting of the Association for Computational Linguistics

详情
AI中文摘要

大型视觉语言模型(LVLMs)中的幻觉仍然是一个持续的挑战,通常源于多模态推理过程中视觉信息整合不足。一个关键原因是模型过度依赖文本先验而未能充分利用视觉线索,导致输出语言流畅但视觉上不准确。例如,给定一张空厨房台面的图像,LVLM可能会根据语言关联而非视觉证据幻觉出“一碗水果”或“一杯咖啡”。大多数LVLM通过将视觉特征附加到预训练LLM的输入流中,并在大规模视觉语言数据集上训练来整合视觉特征。我们的系统分析表明,由于LLM对语言主导表示的固有偏见,这种策略往往导致对文本信息的过度依赖。这种不平衡使注意力偏向文本而非视觉内容,削弱了模型将输出基于视觉输入的能力。为了解决这个问题,我们提出了一种简单而有效的视觉特征融入方法,鼓励模型学习与基础LLM不同的视觉信息化的文本嵌入,并促进更平衡的注意力分布。在多个幻觉基准上的实验结果表明,我们的方法显著减少了幻觉,并促进了更平衡的多模态推理。值得注意的是,我们的方法取得了显著提升,包括在MMVP-MLLM上+9.33%,在POPE-AOKVQA上+2.99%,在Merlin上高达+3.4%,以及在HallusionBench的硬数据分割上+3%。

英文摘要

Hallucinations in Large Vision-Language Models (LVLMs) remain a persistent challenge, often stemming from inadequate integration of visual information during multimodal reasoning. A key cause is the model's over-reliance on textual priors and underutilization of visual cues, leading to outputs that are linguistically fluent but visually inaccurate. For example, given an image of an empty kitchen countertop, an LVLM might hallucinate a "bowl of fruit" or "cup of coffee", relying on language associations rather than visual evidence. Most LVLMs incorporate visual features by appending them to the input stream of a pre-trained LLM and training on large-scale vision-language datasets. Our systematic analysis reveals that this strategy often leads to over-dependence on textual information due to the inherent bias of LLMs towards language-dominant representations. This imbalance skews attention towards the text over visual content, weakening the model's ability to ground outputs in visual inputs. To address this, we propose a simple yet effective visual feature incorporation method that encourages the model to learn visually-informed textual embeddings distinct from those of the base LLM and promotes a more balanced attention distribution. Experimental results across multiple hallucination benchmarks demonstrate that our method significantly reduces hallucinations and fosters more balanced multimodal reasoning. Notably, our approach achieves substantial gains, including +9.33% on MMVP-MLLM, +2.99% on POPE-AOKVQA, up to +3.4% on Merlin, and +3% on the hard-data split of HallusionBench.

2510.05150 2026-06-15 cs.CL cs.AI 版本更新

Chronological Thinking in Full-Duplex Spoken Dialogue Language Models

全双工口语对话语言模型中的时间顺序思考

Donghang Wu, Haoyang Zhang, Chen Chen, Tianyu Zhang, Fei Tian, Xuerui Yang, Gang Yu, Hexin Liu, Nana Hou, Yuchen Hu, Eng Siong Chng

发表机构 * Nanyang Technological University(南洋理工大学) StepFun Mila

AI总结 提出Chronological Thinking机制,让全双工对话模型在听用户说话时增量推理,不增加延迟,提升响应质量。

Comments Accepted by SIGDIAL 2026

详情
AI中文摘要

近期口语对话语言模型(SDLMs)的进展反映了从轮次式向全双工系统转变的日益增长的兴趣,其中模型在生成响应的同时持续感知用户语音流。这种同时听和说的设计实现了实时交互,并且智能体可以处理动态对话行为,如用户插话。然而,在听阶段,现有系统通过重复预测静默标记使智能体保持空闲,这偏离了人类行为:我们在对话中通常进行轻量级思考,而不是心不在焉。受此启发,我们提出了Chronological Thinking,一种即时对话思考机制,旨在提高全双工SDLMs的响应质量。具体来说,Chronological Thinking从传统的LLM思考方法(如思维链)中进行了范式转变,专为流式声学输入而设计。(1)严格因果:智能体在听的同时增量推理,仅从过去的音频更新内部假设,无前瞻。(2)无额外延迟:推理在听窗口期间分摊;一旦用户停止说话,智能体停止思考并立即开始说话,无进一步延迟。实验通过客观指标和人工评估证明了Chronological Thinking的有效性,在响应质量上表现出一致的改进。此外,Chronological Thinking稳健地处理对话动态,并在全双工交互指标上取得了竞争性性能。

英文摘要

Recent advances in spoken dialogue language models (SDLMs) reflect growing interest in shifting from turn-based to full-duplex systems, where the models continuously perceive user speech streams while generating responses. This simultaneous listening and speaking design enables real-time interaction and the agent can handle dynamic conversational behaviors like user barge-in. However, during the listening phase, existing systems keep the agent idle by repeatedly predicting the silence token, which departs from human behavior: we usually engage in lightweight thinking during conversation rather than remaining absent-minded. Inspired by this, we propose Chronological Thinking, an on-the-fly conversational thinking mechanism that aims to improve response quality in full-duplex SDLMs. Specifically, chronological thinking presents a paradigm shift from conventional LLM thinking approaches, such as Chain-of-Thought, purpose-built for streaming acoustic input. (1) Strictly causal: the agent reasons incrementally while listening, updating internal hypotheses only from past audio with no lookahead. (2) No additional latency: reasoning is amortized during the listening window; once the user stops speaking, the agent halts thinking and begins speaking without further delay. Experiments demonstrate the effectiveness of chronological thinking through both objective metrics and human evaluations show consistent improvements in response quality. Furthermore, chronological thinking robustly handles conversational dynamics and attains competitive performance on full-duplex interaction metrics.

2502.00336 2026-06-15 cs.LG stat.ML 版本更新

Denoising Score Matching with Random Features: Insights on Diffusion Models from Precise Learning Curves

随机特征去噪分数匹配:从精确学习曲线看扩散模型

Anand Jerry George, Rodrigo Veiga, Nicolas Macris

发表机构 * École Polytechnique Fédérale de Lausanne (EPFL)(联邦理工学院洛桑校区)

AI总结 通过随机特征神经网络参数化分数函数,推导去噪分数匹配的渐近精确误差,揭示模型复杂度、数据量和噪声样本数对扩散模型泛化与记忆的影响。

Comments Published at AISTATS 2026

详情
AI中文摘要

我们从理论上研究扩散模型中的泛化和记忆现象。实证研究表明,这些现象受模型复杂度和训练数据集大小的影响。在我们的实验中,我们进一步观察到去噪分数匹配(DSM)中每个数据样本使用的噪声样本数($m$)起着显著且非平凡的作用。我们通过在一个简单理论设置下推导DSM测试误差和训练误差的渐近精确表达式,捕捉这些行为并揭示其机制。分数函数由随机特征神经网络参数化,目标分布为$d$维高斯分布。我们在维度$d$、数据样本数$n$和特征数$p$趋于无穷大,同时保持比率$\psi_n=\frac{n}{d}$和$\psi_p=\frac{p}{d}$固定的情况下进行操作。通过刻画测试和训练误差,我们确定了作为$\psi_n$、$\psi_p$和$m$函数的泛化和记忆区域。我们的理论发现与实证观察一致。

英文摘要

We theoretically investigate the phenomena of generalization and memorization in diffusion models. Empirical studies suggest that these phenomena are influenced by model complexity and the size of the training dataset. In our experiments, we further observe that the number of noise samples per data sample ($m$) used during Denoising Score Matching (DSM) plays a significant and non-trivial role. We capture these behaviors and shed insights into their mechanisms by deriving asymptotically precise expressions for test and train errors of DSM under a simple theoretical setting. The score function is parameterized by random features neural networks, with the target distribution being $d$-dimensional Gaussian. We operate in a regime where the dimension $d$, number of data samples $n$, and number of features $p$ tend to infinity while keeping the ratios $ψ_n=\frac{n}{d}$ and $ψ_p=\frac{p}{d}$ fixed. By characterizing the test and train errors, we identify regimes of generalization and memorization as a function of $ψ_n,ψ_p$, and $m$. Our theoretical findings are consistent with the empirical observations.

2510.01663 2026-06-15 cs.LG cs.AI 版本更新

Shift-Invariant Attribute Scoring for Kolmogorov-Arnold Networks via Shapley Value

基于Shapley值的Kolmogorov-Arnold网络平移不变属性评分

Wangxuan Fan, Ching Wang, Siqi Li, Nan Liu

发表机构 * GitHub

AI总结 提出ShapKAN框架,利用Shapley值归因实现平移不变的节点重要性评估,有效压缩KAN网络并保持其可解释性优势。

Comments 14 pages, 6 figures, 9 tables

详情
AI中文摘要

对于许多实际应用,理解特征与结果之间的关系与实现高预测准确性同样重要。虽然传统神经网络在预测方面表现出色,但其黑箱性质掩盖了潜在的功能关系。Kolmogorov-Arnold网络(KAN)通过在边上采用可学习的基于样条的激活函数来解决这一问题,能够在保持竞争性能的同时恢复符号表示。然而,KAN的架构对网络剪枝提出了独特的挑战。由于对输入坐标平移的敏感性,传统的基于幅度的方法变得不可靠。我们提出了\textbf{ShapKAN},一种使用Shapley值归因以平移不变方式评估节点重要性的剪枝框架。与基于幅度的方法不同,ShapKAN量化每个节点的实际贡献,确保无论输入参数化如何,重要性排名保持一致。在合成和真实世界数据集上的大量实验表明,ShapKAN在实现有效网络压缩的同时保留了真实的节点重要性。我们的方法提升了KAN的可解释性优势,便于在资源受限环境中部署。

英文摘要

For many real-world applications, understanding feature-outcome relationships is as crucial as achieving high predictive accuracy. While traditional neural networks excel at prediction, their black-box nature obscures underlying functional relationships. Kolmogorov--Arnold Networks (KANs) address this by employing learnable spline-based activation functions on edges, enabling recovery of symbolic representations while maintaining competitive performance. However, KAN's architecture presents unique challenges for network pruning. Conventional magnitude-based methods become unreliable due to sensitivity to input coordinate shifts. We propose \textbf{ShapKAN}, a pruning framework using Shapley value attribution to assess node importance in a shift-invariant manner. Unlike magnitude-based approaches, ShapKAN quantifies each node's actual contribution, ensuring consistent importance rankings regardless of input parameterization. Extensive experiments on synthetic and real-world datasets demonstrate that ShapKAN preserves true node importance while enabling effective network compression. Our approach improves KAN's interpretability advantages, facilitating deployment in resource-constrained environments.

2510.00375 2026-06-15 cs.LG cs.HC 版本更新

Multidimensional Bayesian Active Machine Learning of Working Memory Task Performance

工作记忆任务表现的多维贝叶斯主动机器学习

Dom CP Marticorena, Chris Wissmann, Zeyu Lu, Dennis L Barbour

发表机构 * Department of Biomedical Engineering, Washington University(生物医学工程系,华盛顿大学) Department of Computer Science and Engineering, Washington University(计算机科学与工程系,华盛顿大学)

AI总结 提出贝叶斯二维主动分类方法,在虚拟环境中控制空间负荷和特征绑定负荷,使用高斯过程分类器估计性能曲面,实现快速收敛并揭示个体差异。

Comments 41 pages, 7 figures

详情
AI中文摘要

虽然自适应实验设计已经超越了一维阶梯式自适应,但大多数认知实验仍然控制单个因素并用标量总结表现。我们展示了一种贝叶斯双轴主动分类方法的验证,该方法在沉浸式虚拟测试环境中针对5×5工作记忆重建任务进行。控制两个变量:项目的空间负荷L(占用瓦片数量)和特征绑定负荷K(不同颜色数量)。刺激获取由非参数高斯过程(GP)概率分类器的后验不确定性引导,该分类器输出(L, K)上的曲面,而不是单个阈值或最大跨度值。在年轻成人群体中,我们将GP驱动的自适应模式(AM)与传统的自适应阶梯经典模式(CM)进行比较,后者仅在K=3时变化L。在该队列中,两种方法之间达到一致性,在K=3时组内相关系数为0.755。此外,AM揭示了空间负荷和特征绑定之间交互作用的个体差异。AM估计比其他采样策略收敛更快,表明仅需约30个样本即可准确拟合完整模型。

英文摘要

While adaptive experimental design has outgrown one-dimensional, staircase-based adaptations, most cognitive experiments still control a single factor and summarize performance with a scalar. We show a validation of a Bayesian, two-axis, active-classification approach, carried out in an immersive virtual testing environment for a 5-by-5 working-memory reconstruction task. Two variables are controlled: spatial load L (number of occupied tiles) and feature-binding load K (number of distinct colors) of items. Stimulus acquisition is guided by posterior uncertainty of a nonparametric Gaussian Process (GP) probabilistic classifier, which outputs a surface over (L, K) rather than a single threshold or max span value. In a young adult population, we compare GP-driven Adaptive Mode (AM) with a traditional adaptive staircase Classic Mode (CM), which varies L only at K = 3. Parity between the methods is achieved for this cohort, with an intraclass coefficient of 0.755 at K = 3. Additionally, AM reveals individual differences in interactions between spatial load and feature binding. AM estimates converge more quickly than other sampling strategies, demonstrating that only about 30 samples are required for accurate fitting of the full model.

2507.13263 2026-06-15 cs.LG cs.AI 版本更新

From Sorting Algorithms to Scalable Kernels: Bayesian Optimization in High-Dimensional Permutation Spaces

从排序算法到可扩展核:高维排列空间中的贝叶斯优化

Zikai Xie, Linjiang Chen

发表机构 * State Key Laboratory of Precision and Intelligent Chemistry(精准与智能化学国家重点实验室)

AI总结 针对高维排列空间贝叶斯优化中表示可扩展性差的问题,提出基于排序算法的核函数框架,其中Mallows核是枚举排序的特例,而新提出的Merge核通过归并排序的分解结构实现Θ(n log n)复杂度且无信息损失,在低维性能相当,高维显著提升优化效果与计算效率。

Comments 9 pages, published on ICLR-26

详情
AI中文摘要

贝叶斯优化(BO)是黑箱优化的强大工具,但其在高维排列空间中的应用受到定义可扩展表示的严重限制。当前最先进的排列空间BO方法依赖于穷举的Ω(n^2)成对比较,导致密集表示,不适用于大规模排列。为了突破这一障碍,我们引入了一个新框架,通过从排序算法导出的核函数生成高效的排列表示。在该框架中,Mallows核可以被视为从枚举排序导出的特例。此外,我们引入了Merge核,它利用归并排序的分治结构生成紧凑的Θ(n log n)表示,实现了最低可能复杂度且无信息损失,并有效捕捉排列结构。我们的核心论点是,Merge核在低维设置中与Mallows核性能相当,但随着维度n增长,在优化性能和计算效率上显著优于后者。在各种排列优化基准上的广泛评估证实了我们的假设,表明Merge核为高维排列空间中的贝叶斯优化提供了可扩展且更有效的解决方案,从而释放了解决以前难以处理的问题(如大规模特征排序和组合神经架构搜索)的潜力。

英文摘要

Bayesian Optimization (BO) is a powerful tool for black-box optimization, but its application to high-dimensional permutation spaces is severely limited by the challenge of defining scalable representations. The current state-of-the-art BO approach for permutation spaces relies on an exhaustive $Ω(n^2)$ pairwise comparison, inducing a dense representation that is impractical for large-scale permutations. To break this barrier, we introduce a novel framework for generating efficient permutation representations via kernel functions derived from sorting algorithms. Within this framework, the Mallows kernel can be viewed as a special instance derived from enumeration sort. Further, we introduce the \textbf{Merge Kernel} , which leverages the divide-and-conquer structure of merge sort to produce a compact, $Θ(n\log n)$ to achieve the lowest possible complexity with no information loss and effectively capture permutation structure. Our central thesis is that the Merge Kernel performs competitively with the Mallows kernel in low-dimensional settings, but significantly outperforms it in both optimization performance and computational efficiency as the dimension $n$ grows. Extensive evaluations on various permutation optimization benchmarks confirm our hypothesis, demonstrating that the Merge Kernel provides a scalable and more effective solution for Bayesian optimization in high-dimensional permutation spaces, thereby unlocking the potential for tackling previously intractable problems such as large-scale feature ordering and combinatorial neural architecture search.

2508.18693 2026-06-15 cs.CV 版本更新

Feature-Space Planes Searcher: A Universal Domain Adaptation Framework for Interpretability and Computational Efficiency

特征空间平面搜索器:一种通用领域自适应框架,兼顾可解释性与计算效率

Zhitong Cheng, Yiran Jiang, Yulong Ge, Yufeng Li, Zhongheng Qin, Rongzhi Lin, Jianwei Ma

发表机构 * School of Mathematics and Institute for Artificial Intelligence, Harbin Institute of Technology, China(数学学院和人工智能研究院,哈尔滨工业大学,中国) School of Earth and Space Sciences, Institute for Artificial Intelligence, Peking University, China(地球和空间科学学院,人工智能研究院,北京大学,中国)

AI总结 提出特征空间平面搜索器(FPS),通过冻结特征编码器并利用特征空间几何模式优化决策边界,实现高效、可解释的领域自适应,在多个基准上达到竞争性能。

详情
AI中文摘要

领域偏移,即从标记源域到未标记目标域时模型性能下降,是部署深度学习系统的一个持续挑战。当前的无监督领域自适应(UDA)方法主要依赖于微调特征提取器,这种方法存在效率低、可解释性差以及对现代架构扩展性不足的问题。我们的分析表明,在大规模数据上预训练的模型在其特征空间中表现出域不变的几何模式,以类内聚类和类间分离为特征,从而保留了可迁移的判别结构。这些发现表明,领域偏移主要表现为边界不对齐而非特征退化。与微调整个预训练模型(这有引入不可预测特征失真的风险)不同,我们提出特征空间平面搜索器(FPS):一种新颖的领域自适应框架,通过利用这些几何模式优化决策边界,同时保持特征编码器冻结。这种简化的方法能够对自适应进行解释性分析,同时通过离线特征提取大幅降低内存和计算成本,允许在单个计算周期内进行全数据集优化。在公共基准上的评估表明,FPS达到了与最先进方法竞争或更优的性能。FPS能够高效地扩展到多模态大模型,并在包括蛋白质结构预测、遥感分类和地震检测在内的多个领域展现出通用性。我们预计FPS将为迁移学习,特别是领域自适应任务,提供一种简单、有效且可推广的范式。

英文摘要

Domain shift, characterized by degraded model performance during transition from labeled source domains to unlabeled target domains, poses a persistent challenge for deploying deep learning systems. Current unsupervised domain adaptation (UDA) methods predominantly rely on fine-tuning feature extractors - an approach limited by inefficiency, reduced interpretability, and poor scalability to modern architectures. Our analysis reveals that models pretrained on large-scale data exhibit domain-invariant geometric patterns in their feature space, characterized by intra-class clustering and inter-class separation, thereby preserving transferable discriminative structures. These findings indicate that domain shifts primarily manifest as boundary misalignment rather than feature degradation. Unlike fine-tuning entire pre-trained models - which risks introducing unpredictable feature distortions - we propose the Feature-space Planes Searcher (FPS): a novel domain adaptation framework that optimizes decision boundaries by leveraging these geometric patterns while keeping the feature encoder frozen. This streamlined approach enables interpretative analysis of adaptation while substantially reducing memory and computational costs through offline feature extraction, permitting full-dataset optimization in a single computation cycle. Evaluations on public benchmarks demonstrate that FPS achieves competitive or superior performance to state-of-the-art methods. FPS scales efficiently with multimodal large models and shows versatility across diverse domains including protein structure prediction, remote sensing classification, and earthquake detection. We anticipate FPS will provide a simple, effective, and generalizable paradigm for transfer learning, particularly in domain adaptation tasks. .

2508.06196 2026-06-15 cs.CL cs.HC 版本更新

EiCAP: Beyond Fluency, Probing and Improving Emotional Intelligence in LLMs via Psychologically Grounded Multi-Turn Dialogue

EiCAP:超越流畅性,通过心理学基础的多轮对话探究和提升大语言模型的情感智能

Nizi Nazar, Pardis Sadat Zahraei, Dilek Hakkani-Tür, Natasa Milic-Frayling, Ehsaneddin Asgari

发表机构 * Qatar Computing Research Institute (QCRI), Hamad Bin Khalifa University(卡塔尔计算研究所(QCRI),哈马德·本·卡伊夫大学) University of Illinois Urbana-Champaign (UIUC)(伊利诺伊大学厄巴纳-香槟分校)

AI总结 提出基于心理学六层情感智能分类法的EiCAP框架,包含评估基准EiCAP-Bench和微调语料EiCAP-SFT,发现通用对话微调不提升情感智能,而基于情感智能的LoRA微调显著提升模型在所有24个子类别上的表现。

详情
AI中文摘要

大语言模型越来越多地用于情感敏感角色,包括心理健康支持、教育和危机响应,但它们缺乏评估或提升情感智能(EI)的原则性框架。我们引入EiCAP,一个统一的、基于心理学的六层EI分类法,并实现为两个互补资源。EiCAP-Bench是一个多轮、一对三强制选择评估套件,包含3,174个探针,涵盖24个子类别和跨轮依赖关系,反映真实对话EI需求。EiCAP-SFT是一个152,820轮对话的监督语料库,与同一分类法对齐,支持可控、可解释的微调。两个关键发现出现。首先,通用对话监督微调不会赋予EI:在UltraChat上微调在24个子类别中均无显著提升,宏观得分为24.6%,接近随机水平25%。其次,直接对Qwen-2.5-7B-Base应用基于EI的LoRA(使用约0.8%的参数)在全部24个子类别上取得显著提升,宏观得分达到75.33%,比Base提升51.7个百分点,比Instruct提升37.1个百分点。关键的是,消融实验表明UltraChat预训练阶段适得其反,使性能降低21.4个百分点:直接基于EI的训练既必要又充分。

英文摘要

Large Language Models increasingly serve in emotionally sensitive roles, including mental health support, education, and crisis response, yet they lack a principled framework for assessing or improving Emotional Intelligence (EI). We introduce EiCAP, a unified, psychologically grounded six-layer EI taxonomy operationalized into two complementary resources. EiCAP-Bench is a multi-turn, one-vs-three forced-choice evaluation suite with 3,174 probes across 24 subcategories and cross-turn dependencies that reflect real conversational EI demands. EiCAP-SFT is a 152,820-dialogue supervision corpus aligned to the same taxonomy, enabling controlled, interpretable fine-tuning. Two key findings emerge. First, generic conversational supervised fine-tuning does not confer EI: fine-tuning on UltraChat yields no significant gain in any of the 24 subcategories, with a macro score of 24.6%, near the chance level of 25%. Second, applying EI-grounded LoRA, using approximately 0.8% of parameters, directly to Qwen-2.5-7B-Base achieves significant gains in all 24 subcategories, reaching a macro score of 75.33%, a gain of 51.7 percentage points over Base and 37.1 percentage points over Instruct. Crucially, an ablation shows that the UltraChat pre-stage is counterproductive, reducing performance by 21.4 percentage points: direct EI-grounded training is both necessary and sufficient.

2508.05782 2026-06-15 cs.CL 版本更新

FineDialFact: A benchmark for Fine-grained Dialogue Fact Verification

FineDialFact:细粒度对话事实验证基准

Xiangyan Chen, Yufeng Li, Yujian Gan, Arkaitz Zubiaga, Matthew Purver

发表机构 * arXiv

AI总结 针对对话系统幻觉检测中事实标签粗粒度的问题,提出细粒度对话事实验证基准FineDialFact,基于公开数据集构建并评估多种基线方法,实验表明思维链推理可提升性能,但最佳F1仅0.74,任务仍具挑战性。

详情
AI中文摘要

大型语言模型已知会产生幻觉——事实不正确或捏造的信息——这对许多自然语言处理应用(如对话系统)构成了重大挑战。因此,检测幻觉已成为一个关键研究领域。当前对话系统中幻觉检测的方法主要集中于验证生成回复的事实一致性。然而,这些回复通常包含准确、不准确或不可验证的事实的混合,使得使用单一事实标签过于简单和粗粒度。在本文中,我们引入了一个基准FineDialFact,用于细粒度对话事实验证,涉及验证从对话回复中提取的原子事实。为此,我们基于公开可用的对话数据集构建了一个数据集,并使用各种基线方法对其进行评估。实验结果表明,结合思维链推理的方法可以提升对话事实验证的性能。尽管如此,在开放域对话数据集HybriDialogue上取得的最佳F1分数仅为0.74,表明该基准对于未来研究仍是一个具有挑战性的任务。我们在以下网址发布数据集和代码:https://this https URL。

英文摘要

Large language models are known to produce hallucinations - factually incorrect or fabricated information - which poses significant challenges for many natural language processing applications, such as dialogue systems. As a result, detecting hallucinations has become a critical area of research. Current approaches to hallucination detection in dialogue systems primarily focus on verifying the factual consistency of generated responses. However, these responses often contain a mix of accurate, inaccurate or non-verifiable facts, making the use of a single factual label overly simplistic and coarse-grained. In this paper, we introduce a benchmark, FineDialFact, for fine-grained dialogue fact verification, which involves verifying atomic facts extracted from dialogue responses. To support this, we construct a dataset based on publicly available dialogue datasets and evaluate it using various baseline methods. Experimental results demonstrate that methods incorporating Chain-of-Thought reasoning can enhance performance in dialogue fact verification. Despite this, the best F1-score achieved on the HybriDialogue, an open-domain dialogue dataset, is only 0.74, indicating that the benchmark remains a challenging task for future research. We release our dataset and code at https://github.com/XiangyanChen/FineDialFact.