arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

AI中文摘要

自愈智能电网能够在故障期间快速调整其网络配置，以最小化电力中断。在故障期间，可以采取多种措施，例如通过开关操作进行网络重构和紧急甩负荷。然而，传统的用于故障缓解的机器学习方法由于响应速度慢和计算成本高，不适用于智能电网。为了解决这些挑战，最近的研究探索了使用强化学习自动执行网络重构。在这些方法中，控制策略通常使用图神经网络（GNN）建模。然而，传统的GNN在空间域中运行，可能无法捕捉频域中的重要关系。频域信息对于建模电力网络中的全局结构模式和系统范围交互特别有用。在本文中，我们提出了一种用于配电网故障管理的频谱图强化学习框架，以增强系统韧性。我们的模型使用频谱图神经网络学习最优电力恢复策略。我们在三个修改后的IEEE测试系统上评估了所提出的方法：13节点、34节点和123节点网络。实验结果表明，我们的方法在实时性上达到了接近最优的性能，并且在广泛的故障场景中具有良好的泛化能力。

英文摘要

Self-healing smart grids can quickly adjust their network configuration during outages to minimize power disruptions. During an outage, several actions can be taken, such as network reconfiguration through switching operations and emergency load shedding. However, traditional machine learning methods for outage mitigation are not well suited for smart grids due to their slow response time and high computational cost. To address these challenges, recent studies have explored reinforcement learning to automatically perform network reconfiguration. In these approaches, the control policy is typically modeled using a graph neural network (GNN). However, conventional GNNs operate in the spatial domain and may fail to capture important relationships in the frequency domain. Frequency-domain information is particularly useful for modeling global structural patterns and system-wide interactions in power networks. In this paper, we propose a spectral graph reinforcement learning framework for outage management in distribution networks to enhance system resilience. Our model learns the optimal power restoration policy using a spectral graph neural network. We evaluate the proposed method on three modified IEEE test systems: the 13-bus, 34-bus, and 123-bus networks. Experimental results show that our approach achieves near-optimal performance in real time and generalizes well across a wide range of outage scenarios.

URL PDF HTML ☆

赞 0 踩 0

2606.07582 2026-06-09 cs.LG cs.AI cs.ET 新提交

Customer Churn Prediction on Structured Data Using FT-Transformer and Stacking Ensembles

基于FT-Transformer和堆叠集成的结构化数据客户流失预测

Joyjit Roy, Samaresh Kumar Singh, Laxmi Shaw

发表机构 * Independent Researcher, Austin, TX, USA（独立研究员，美国德克萨斯州奥斯汀）； Independent Researcher, Leander, TX（独立研究员，美国德克萨斯州利安德）； Texas A & M University-Victoria, Victoria, TX（德克萨斯农工大学维多利亚分校）

AI总结提出一种结合FT-Transformer与XGBoost的混合架构，通过校准感知堆叠集成处理类别不平衡和特征交互，在银行客户流失数据集上F1达62.10%，AUC-ROC为0.861。

Comments 22 pages, 9 figures, 20 tables; published in IEEE Access

Journal ref IEEE Access, vol. 14, pp. 62834-62855, 2026

详情

DOI: 10.1109/ACCESS.2026.3686374

AI中文摘要

客户流失预测在保险、数字银行、电子商务和订阅平台等数据驱动行业中至关重要，因为保留现有客户通常比获取新客户更具成本效益。由于类别不平衡、非线性特征交互和异质特征类型，在结构化数据集上预测流失仍然具有挑战性。基于树的集成方法在这些场景中始终表现出强大的性能，通常优于传统神经网络。本研究引入了一种经过验证的混合架构，通过校准感知堆叠将特征标记化变换器（FT-Transformer）与梯度提升树相结合。所提出的框架解决了先前研究中在统计验证、概率校准和可重复性方面的持续空白。FT-Transformer利用自注意力捕获高阶特征交互，而XGBoost通过互补的归纳偏置捕获梯度提升决策边界。类别不平衡通过使用类别加权损失函数处理，从而避免合成过采样并保留少数类分布。模型使用基于折叠外（OOF）堆叠的逻辑回归元学习器进行集成，该元学习器重新校准过于自信的基模型输出并学习最优组合权重。在一个公开的银行流失数据集上，混合模型在5x5交叉验证下达到62.10%的F1、0.861的AUC-ROC和0.647的PR-AUC，相比多层感知机（MLP）基线分别提升3.37个F1点和0.027个AUC，并报告了95%置信区间。消融研究表明，变换器组件和堆叠策略都对性能有实质性贡献。所提出的方法为结构化表格数据上的当代流失预测提供了一个可重复且可扩展的参考架构。

英文摘要

Customer churn prediction is essential across data-driven industries such as insurance, digital banking, eCommerce, and subscription platforms, where retaining existing customers is typically more cost-effective than acquiring new ones. Predicting churn on structured datasets remains challenging due to class imbalance, nonlinear feature interactions, and heterogeneous feature types. Tree-based ensemble methods consistently demonstrate strong performance in these contexts, often outperforming conventional neural networks. This study introduces a validated hybrid architecture that integrates feature-tokenized transformers (FT-Transformer) with gradient-boosted trees through calibration-aware stacking. The proposed framework addresses persistent gaps in statistical validation, probability calibration, and reproducibility found in prior research. The FT-Transformer captures higher-order feature interactions using self-attention, while XGBoost captures gradient-boosted decision boundaries with complementary inductive biases. Class imbalance is handled using class-weighted loss functions, thereby avoiding synthetic oversampling and preserving minority-class distributions. The models are ensembled using out-of-fold (OOF) stacking with a logistic regression meta-learner, which recalibrates overconfident base model outputs and learns optimal combination weights. On a public bank churn dataset, the hybrid model achieves 62.10% F1, 0.861 AUC-ROC, and 0.647 PR-AUC, outperforming the Multi-Layer Perceptron (MLP) baseline by 3.37 F1 points and 0.027 AUC under 5x5 cross-validation with 95% confidence intervals reported. Ablation studies demonstrate that both the transformer component and stacking strategy contribute materially to performance. The proposed methodology offers a reproducible and extensible reference architecture for contemporary churn prediction on structured tabular data.

URL PDF HTML ☆

赞 0 踩 0

2606.07581 2026-06-09 cs.LG cs.AI cs.ET 新提交

Training-Inference Kernel Contracts: Bounding Divergence in Post-Training and Deployment

训练-推理核契约：约束后训练与部署中的偏差

Bruce Changlong Xu, Lan Wu

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出核契约框架，通过数值、统计、运行时和可观测性条款约束训练核与推理核之间的分布偏差，并推导偏差界以保障策略梯度无偏性。

详情

AI中文摘要

现代后训练流程通常为其策略π_θ编写一个符号，但通过两个不同的程序进行评估：一个针对自动微分优化的训练核和一个针对低精度、融合、动态批处理服务优化的推理核。在有限精度下，这些核在相同权重下可能产生不同的分布，且差距集中在基准测试未充分代表的切片上。本文提出核契约：一个契约优先的框架，用于指定K_train和K_inf之间可接受的偏差。契约C = (N, S, R, O, Pi) 结合了数值、统计、运行时和可观测性条款，以及从违规到路由操作的升级策略。我们推导了从logit漂移到总变差距离再到有界奖励漂移的链式界限，并将其专门用于强化学习后训练，其中在显式支持和范数假设下，每个token的重要性比率漂移给出了策略梯度偏差的界限。我们还描述了一个四阶段提升管道、在线路由循环以及用于契约工件的极简YAML DSL。本文是一个框架和词汇论文；我们不报告生产规模的实证验证。

英文摘要

A modern post-training pipeline often writes one symbol for its policy, pi_theta, while evaluating it through two different programs: a training kernel optimized for autograd and an inference kernel optimized for low-precision, fused, dynamically batched serving. In finite precision, these kernels can induce different distributions at identical weights, with the gap concentrated on slices that aggregate benchmarks under-represent. This paper proposes kernel contracts: a contract-first framework for specifying acceptable divergence between K_train and K_inf. A contract C = (N, S, R, O, Pi) combines numerical, statistical, runtime, and observability clauses with an escalation policy from violations to routing actions. We derive a chain of bounds from logit drift to total-variation distance to bounded reward drift, and specialize it to RL post-training, where per-token importance-ratio drift yields a bound on policy-gradient bias under explicit support and norm assumptions. We also describe a four-stage promotion pipeline, online routing loop, and minimal YAML DSL for contract artifacts. This is a framework and vocabulary paper; we do not report production-scale empirical validation.

URL PDF HTML ☆

赞 0 踩 0

2606.07578 2026-06-09 cs.LG stat.ME stat.ML 新提交

为扩散语言模型启用共享前缀的KV缓存

Younghun Go, Jaehoon Han, Changyong Shin, Chuk Yoo, Gyeongsik Yang

发表机构 * Korea University（高丽大学）

AI总结针对扩散语言模型中双向注意力导致共享前缀KV不稳定的问题，提出双向前缀缓存（bicache），通过动态识别安全层深度重用KV，避免精度崩溃，提升吞吐量36.3%-98.3%。

详情

AI中文摘要

共享前缀的键值（KV）缓存对于高吞吐量的大语言模型（LLM）服务至关重要，但在新兴的扩散语言模型（DLM）中面临严峻挑战。在DLM中，双向注意力意味着更新任何token都会动态改变整个上下文及其对应的KV。因此，为LLM开发的现有缓存技术（假设KV一旦计算就保持不变）会破坏共享前缀KV。我们的实验表明，将这些技术应用于DLM会导致模型精度几乎降为零。为了解锁高吞吐量的DLM服务，我们提出了双向前缀缓存（bicache），这是第一个用于DLM中共享前缀的KV缓存技术。bicache基于我们全面分析的关键观察设计：共享前缀KV在浅层中保持稳定且可重用，而浅层的深度取决于每个请求中共享前缀token的比例。因此，bicache动态识别用于重用共享前缀KV的安全层深度，并消除冗余计算。评估表明，与现有技术相比，bicache显著提高了服务吞吐量36.3%-98.3%，且没有精度崩溃（仅0-1.8%的差异）。

英文摘要

Key-value (KV) caching for shared prefixes is essential for high-throughput large language model (LLM) serving, but it faces critical challenges in emerging diffusion language models (DLMs). In DLMs, bidirectional attention means that updating any token dynamically alters the entire context and its corresponding KVs. Thus, existing caching techniques developed for LLMs, which assume that KVs remain invariant once computed, corrupt the shared prefix KVs. Our experiments show that applying these techniques to DLMs causes model accuracy to collapse to near zero. To unlock high-throughput DLM serving, we propose bidirectional prefix caching, bicache, the first KV caching technique for shared prefixes in DLMs. bicache is designed based on key observations from our comprehensive analysis: shared prefix KVs remain stable and reusable in shallow layers, while the depth of shallow layers depends on the fraction of shared prefix tokens in each request. Thus, bicache dynamically identifies a safe layer depth for reusing shared prefix KVs and eliminates redundant computation. Evaluations demonstrate that bicache significantly improves serving throughput by 36.3%-98.3% compared to existing techniques without accuracy collapse (only 0-1.8% difference).

URL PDF HTML ☆

赞 0 踩 0

2606.07569 2026-06-09 cs.LG 新提交

函数向量头是两个群体：上下文学习中的写入者和取消者

Han-yu Wang

发表机构 * The University of Hong Kong（香港大学）

AI总结发现函数向量头并非同质群体，而是分为写入者和取消者两个子群体，分别推高和压低规则正确logit，且仅基于幅度的排名无法区分二者。

详情

AI中文摘要

函数向量头（Todd et al., 2024）通常通过其对上下文规则任务的因果贡献幅度来识别，隐含假设顶级集合是同一功能类。这一假设不成立。我们用保留符号的标准（改进的DLA + 置换FDR）替代仅幅度排名，并通过路径修补验证每个候选。然后，FV头群体分裂为两个对立的子群体：写入者推高规则正确logit；取消者压低它。一个四条件规范判定在三个模型家族和六个Pythia规模的13/15个单元中成立，符号置换检验在5/6个主要单元中拒绝同质性。仅幅度排名无法看到这种结构：Todd的前20个在层次任务中捕获了64%的取消者但仅4%的写入者，在模块任务中捕获了59%的写入者但仅8%的取消者。我们在所有27个（取消者，单元，头）对上排除了六种人为解释：归纳重叠、汇点、通用重要性、秩1复制抑制、V级联和最近邻非FV控制。零消融取消者在6/6个主要单元中产生+0.13到+0.29 nats的logit增益，方向一致地带来+2到+7个百分点的准确率提升。

英文摘要

Function-vector (FV) heads (Todd et al., 2024) are typically identified by the magnitude of their causal contribution to in-context rule tasks, under the implicit assumption that the top set is a homogeneous functional class. This assumption fails. We replace magnitude-only ranking with a sign-preserving criterion (refined DLA + permutation FDR) and validate each candidate by path patching. The FV head population then splits into two opposing sub-populations: writers push the rule-correct logit up; cancellers push it down. A four-condition canonical verdict holds in $13/15$ cells across three model families and six Pythia scales, and a sign-shuffle rejects homogeneity in $5/6$ main cells. The structure is invisible to magnitude-only ranking: Todd's top-$20$ captures $64\%$ of cancellers but only $4\%$ of writers on the hierarchical task, and $59\%$ of writers but only $8\%$ of cancellers on the modular task. We rule out six artefact accounts on all $27$ canceller (cell, head) pairs: induction overlap, sinks, generic importance, rank-$1$ copy-suppression, V-cascade, and rank-nearest non-FV controls. Zero-ablating cancellers yields $+0.13$ to $+0.29$ nats of logit gain in $6/6$ main cells with a directionally consistent $+2$ to $+7$ pp accuracy effect.

URL PDF HTML ☆

赞 0 踩 0

2606.07559 2026-06-09 cs.CL cs.AI quant-ph 新提交

Phantom transitions in language model fine-tuning

语言模型微调中的幻影相变

Vaibhav Prakash, Jayasri Dontabhaktuni

发表机构 * Mahindra University（马恒达大学）

AI总结本文研究语言模型微调时，正确补全被近义词竞争而失败的现象，通过序参量分解信号与背景拖拽，发现两种失败模式，并揭示相变为幻影，源于softmax读出而非几何相变。

Comments 26 pages, 9 figures

详情

AI中文摘要

在上下文中微调语言模型，当正确补全存在近义词竞争者时，常常无声地失败。交叉熵损失单调递减，而正确token在排名上从未超越竞争者。我们研究了跨越两个系列和五倍参数范围的五种Transformer架构，在十个精心挑选的近义词上下文中。我们用一个结合预测分布和成对嵌入重叠的序参量来测量这些失败。它可加性地分解为一个信号（跟踪模型对正确token相对于其最近竞争者的承诺）和一个背景拖拽（由嵌入整体向分数泄漏概率的方式决定）。这分离出两种失败模式：运动学失败中信号保持较小；结构失败中拖拽随着微调进行而主动恶化。我们观察到序参量中类似相变的弹弓状跳跃。一个核心负面结果组织了本文：这些相变是幻影。直接测量排除了自发对称破缺的解释。在LoRA微调下，当token嵌入矩阵在训练期间完全不变时，弹弓状跳跃仍然出现，而此处不可能存在几何相变。不连续性完全存在于softmax读出中。少量无量纲量组织跨架构的轨迹。其中一个在所有五种架构的全微调下保持一致。第二个根据整体嵌入分布将架构分为两类，并预测LoRA的充分性。作为盲测，该框架预测了一个未用于拟合任何参数的保留架构的临界学习率，与后续学习率扫描的误差在2.1%以内。研究结果仅涉及近义词机制，未经重新校准不应外推。

英文摘要

Fine-tuning a language model on contexts whose correct completion has a near-synonym competitor often fails silently. The cross-entropy loss decreases monotonically while the correct token never overtakes the competitor in rank. We study this regime across five transformer architectures spanning two families and a fivefold parameter range, on ten hand-selected near-synonym contexts. We instrument these failures with an order parameter combining the predicted distribution and pairwise embedding overlaps. It decomposes additively into a signal, tracking the model's commitment to the correct token over its nearest competitor, and a background drag, set by how the embedding bulk leaks probability into the score. This isolates two failure modes. In kinematic failure the signal stays small. In structural failure the drag actively worsens as fine-tuning proceeds. We observe sharp catapult-like jumps in the order parameter that resemble a phase transition. A central negative result organises the paper. The transitions are phantoms. The spontaneous-symmetry-breaking interpretation is ruled out by direct measurement. Catapult-like jumps still appear under LoRA fine-tuning with the token embedding matrix exactly unchanged during training, where no geometric phase transition is possible. The discontinuity lives entirely in the softmax readout. A small number of dimensionless quantities organise the trajectory across architectures. One is consistent across all five under full fine-tuning. A second sorts architectures into two classes by bulk embedding distribution and predicts LoRA sufficiency. As a blind test, the framework predicts the critical learning rate of a held-out architecture, not used to fit any parameter, to within 2.1% of a subsequent learning-rate sweep. Findings concern the near-synonym mechanism only and should not be extrapolated without recalibration.

URL PDF HTML ☆

赞 0 踩 0

2606.07558 2026-06-09 cs.CV cs.AI cs.DL 新提交

Page image classifier fine-tuned on century-spanning archives of scanned documents for further content-specific processing

基于百年跨度扫描文档档案微调的页面图像分类器，用于进一步的内容特定处理

Kateryna Lutsai, Pavel Straňák, David Novák, Dana Křivánková

发表机构 * Institute of Formal and Applied Linguistics, Charles University MFF（查尔斯大学数学与物理学院形式与应用语言学研究所）； Institute of Archaeology, Czech Academy of Sciences（捷克科学院考古研究所）

AI总结针对历史文档数字化中手动分类不可行的问题，提出基于视觉内容类型（文本、表格、图形）的自动页面图像分类系统，采用微调深度网络（RegNetY-16GF达99.16%准确率）实现近完美分类，并公开模型、数据集和代码。

Comments 29 pages, 19 figures, 13 tables. arXiv admin note: text overlap with arXiv:2507.21114

详情

AI中文摘要

目的：人文学科的数字化项目产生了大量、异构的历史文档档案，使得手动分类在大规模下不切实际。本工作解决基于视觉内容类型——文本、表格和图形——对扫描页面图像进行分类的自动化系统需求，从而支持内容特定的下游处理，如光学字符识别（OCR）或结构化数据提取。方法：开发了一个图像分类系统，并在来自百年历史的捷克考古档案的超过48,000张带注释的历史页面图像数据集上进行评估，通过四个连续的注释阶段和领域专家审查进行优化。使用手工制作的图像特征建立了随机森林分类器基线。随后，微调并比较了深度学习架构：卷积神经网络（EfficientNetV2、RegNetY）、视觉和文档图像变换器（ViT、DiT）以及多模态CLIP模型。与领域专家合作设计了11类标签方案，并通过五折交叉验证进行评估。结果：基于特征的基线实现了约75%的准确率。微调的CNN和变换器显著优于基线，RegNetY-16GF在保留测试集上达到99.16%的Top-1准确率，ViT-large达到99.12%。CLIP ViT-B/16通过优化文本描述达到99.14%的准确率。结论：仅图像模型，特别是RegNetY-16GF，实现了近乎完美的分类准确率，并在649,508张未标注档案页面上产生一致标签，模型间一致性超过90%。微调的CLIP尽管在测试集上具有竞争力，但在未标注数据上与仅图像模型的一致性低于65%，因此不太适合部署。最终模型、注释数据集和软件均以开源许可证公开提供。

英文摘要

Purpose: Digitization projects in the humanities produce vast, heterogeneous archives of historical documents, making manual sorting impractical at scale. This work addresses the need for an automated system to classify scanned page images based on visual content type - text, tables, and graphics - enabling content-specific downstream processing such as Optical Character Recognition (OCR) or structured data extraction. Methods: An image classification system was developed and evaluated on a dataset of over 48,000 annotated historical page images from century-old Czech archaeological archives, refined through four successive annotation stages with domain-expert review. A Random Forest Classifier baseline was established using hand-crafted image features. Subsequently, deep learning architectures were fine-tuned and compared: Convolutional Neural Networks (EfficientNetV2, RegNetY), Vision and Document Image Transformers (ViT, DiT), and multimodal CLIP models. An 11-category label scheme was designed collaboratively with domain experts and evaluated via five-fold cross-validation. Results: The feature-based baseline achieved approximately 75% accuracy. Fine-tuned CNNs and Transformers substantially outperformed it, with RegNetY-16GF achieving 99.16% and ViT-large 99.12% Top-1 accuracy on the held-out test set. CLIP ViT-B/16 reached 99.14% with optimized text descriptions. Conclusion: Image-only models, particularly RegNetY-16GF, deliver near-perfect classification accuracy and produce consistent labels across 649,508 unlabeled archival pages with over 90% inter-model agreement. Fine-tuned CLIP, despite competitive test-set accuracy, showed under 65% agreement with image-only models on unlabeled data, making it less suitable for deployment. The final models, annotated dataset, and software are publicly available under open-source licenses.

URL PDF HTML ☆

赞 0 踩 0

2606.07557 2026-06-09 cs.LG cs.MA cs.SI 新提交

SPIN: Decentralized Swarm Control via Tensorized Policy Coordination

SPIN: 通过张量化策略协调实现去中心化集群控制

Zhaowen Fan

发表机构 * Zhaowen Fan（Fan 资深研究员）

AI总结提出SPIN框架，利用张量网络分解联合策略，将指数复杂度降为线性，并通过离线训练的神经符号管道实现边缘设备上的低延迟去中心化集群控制。

Comments 11 pages, 2 figures, 1 tables, 6 sections

详情

AI中文摘要

在资源受限的边缘平台上，去中心化多智能体集群协调仍然受到联合动作空间指数级扩展和高延迟通信开销的根本性瓶颈。本文介绍了集群策略干扰网络（SPIN）框架，这是一种通过将集群拓扑建模为压缩张量网络来绕过这些限制的架构范式。我们将局部多智能体团簇的联合策略张量分解为矩阵乘积态（MPS）链，将评估的计算复杂度从指数级 $O(n^m)$ 墙降低到严格的线性 $O(m \cdot n \cdot \chi^2)$ 约束。为了在不需高功耗在线训练循环的情况下，将局部连续空间几何与该离散代数后端桥接，我们引入了一个解耦的混合神经符号控制管道。局部多层神经网络作为结构协调编码器，离线预训练以将手工设计的几何描述符非线性映射为抽象环境目标度量。在运行时，边缘智能体通过直接应用 Radon-Nikodým 导数作为零样本重要性重加权滤波器来执行即时行为适应。我们在一个离散时间多智能体仿真沙箱中验证了该框架，涵盖跟踪、去中心化分散/区域覆盖和多目标协调等场景。定性遥测表明，集成管道实现了稳定的目标导向运动、去中心化约束下的抗塌陷空间扩展以及跨多个目标的结构化子群形成，为可处理、低功耗的边缘集群智能提供了一条数学上严谨的路径。

英文摘要

Decentralized multi-agent swarm coordination on resource-constrained edge platforms remains fundamentally bottlenecked by the exponential scaling of joint action spaces and high-latency communication overhead. This paper introduces the Swarm Policy Interference Network (SPIN) framework, an architectural paradigm that bypasses these limitations by modeling swarm topologies as a compressed tensor network. We factorize the joint policy tensors of local multi-agent cliques into Matrix Product State (MPS) chains, reducing the computational complexity of evaluation from an exponential $O(n^m)$ wall to a strictly linear $O(m \cdot n \cdot χ^2)$ constraint. To bridge local continuous spatial geometry with this discrete algebraic backend without requiring power-intensive online training loops, we introduce a decoupled, hybrid neuro-symbolic control pipeline. Local multi-layered neural networks operate as structural coordination encoders, pre-trained offline to nonlinearly map hand-engineered geometric descriptors into abstract environmental target measures. At runtime, edge agents execute instantaneous behavioral adaptations by applying the Radon-Nikodým derivative directly as a zero-shot importance-reweighting filter. We validate the framework within a discrete-time multi-agent simulation sandbox spanning tracking, decentralized dispersion/area coverage, and multi-goal coordination regimes. Qualitative telemetry demonstrates that the integrated pipeline achieves stable target-directed motion, anti-collapse spatial spreading under decentralized constraints, and structured subgroup formation across multiple targets, providing a mathematically grounded route to tractable, low-power edge swarm intelligence.

URL PDF HTML ☆

赞 0 踩 0

2606.07553 2026-06-09 cs.LG cs.AI 新提交

MedicalRec: Medical recommender system for image classification without retraining

MedicalRec：无需重新训练的图像分类医疗推荐系统

Roghayeh Taghavi, Aysa Hasanazde Bashkandi, Amir Ali Bengari, Mohammad Amin Raji, Mohammad Salahi Ardekani, Parisa Mardukhian, Parvaneh Rezaei, Ramin Mousa

发表机构 * University of Tehran（塔里班大学）

AI总结提出基于Transformer的医疗推荐系统MedicalRec，利用从3000篇论文中构建的MedicalRec-Bench数据集（含5000+记录），无需重新训练即可为医疗图像分类任务推荐最优模型，最高HitRate@100达75.5%。

详情

AI中文摘要

机器学习和深度学习的出现彻底改变了医疗保健中诊断、治疗和管理系统的效率。然而，这种快速采用是以需要大量计算能力和能源消耗以及电子垃圾处理和碳排放为代价的。这些模型的挑战之一是为分类任务选择合适的模型。为此，研究人员尝试通过试错法使用他们的数据来确定最佳模型，这涉及能源消耗和浪费。本研究的目标是开发一个基于模型的医疗图像分类推荐系统。为此，从3000篇医疗图像分类领域的文章中收集了一个数据集。该数据集以MedicalRec-Bench的名称公开可用，包含超过5000条在各种任务中测试的模型记录，包括皮肤癌分类、肿瘤分类、伤口分类、乳腺癌和MRI分类。根据特征数量，数据集在四种不同模式下进行评估：MedicalRec I（5个特征）、MedicalRec II（9个特征）、MedicalRec III（11个特征）和MedicalRec IV（18个特征）。由于作者未报告，收集所有特征值具有挑战性；因此，数据集包含大量缺失值。医疗推荐系统（MedicalRec）是一个基于Transformer的模型，用于本研究中的项目推荐。该模型在数据集评估和与12个基础模型的评估中取得了显著成果。该模型实现了最高HitRate@100为75.5%。数据集和实现可通过GitHub链接获取：https://github.com/Ramin1Mousa/MedicalRec

英文摘要

The emergence of machine learning and deep learning has revolutionized the efficiency of diagnostic, therapeutic, and administrative systems in healthcare. However, this rapid adoption has come at the cost of requiring significant computing power and energy consumption, as well as e-waste disposal and carbon emissions. One of the challenges of these models is choosing the right model for classification tasks. To this end, researchers attempt to identify the optimal model using their data through trial and error, which involves energy consumption and waste. The goal of this study is to develop a model-based recommender system for medical image classification. For this purpose, a data set was collected from 3,000 articles in the field of medical image classification. This dataset, publicly available under the name MedicalRec-Bench, contains over 5,000 records of models tested in various tasks, including Skin Cancer Classification, Tumour Classification, Wound Classification, Breast Cancer, and MRI classification. The dataset was evaluated in four different modes, depending on the number of features: MedicalRec I (5 features), MedicalRec II (9 features), MedicalRec III (11 features), and MedicalRec IV (18 features). Collecting all values for the features is challenging due to non-reporting by the authors; hence, the dataset contains significant amounts of missing values. The Medical Recommender System (MedicalRec) is a transformer-based model used for item recommendations in this study. This model achieved remarkable results in the evaluation on the dataset and in the evaluation with 12 base models. This model achieved a maximum HitRate@100 of 75.5%. The dataset and implementations are available through the GitHub link: https://github.com/Ramin1Mousa/MedicalRec

URL PDF HTML ☆

赞 0 踩 0

2606.07550 2026-06-09 cs.LG cs.AI 新提交

Offline Reinforcement Learning for Plasma Control in Nuclear Fusion: Codebase and Benchmark

核聚变等离子体控制的离线强化学习：代码库与基准

Yang Fu, Haomin Bao, Rohit Sonker, Xiaoyan Hu, Aravind Venugopal, Jeff Schneider, Jiayu Chen

发表机构 * Central South University（中南大学）； Chongqing University（重庆大学）； Carnegie Mellon University（卡内基梅隆大学）； The University of Hong Kong（香港大学）

AI总结提出RL4F基准，基于DIII-D托卡马克历史数据构建评估环境，比较多种离线RL方法在等离子体控制任务上的性能，发现基于模型的离线RL方法平均表现最佳。

Comments 23 pages (10 pages main text)

详情

AI中文摘要

离线强化学习（RL）为从历史托卡马克数据开发等离子体控制器提供了一条有前景的途径，因为在真实设备上进行在线试错成本高昂且风险巨大。然而，由于缺乏针对核聚变中现实多执行器、长时域等离子体控制问题的标准化离线RL基准，这一方向的进展仍然难以衡量。我们引入了RL4F，一个用于核聚变等离子体控制的离线强化学习基准，提供了闭环评估环境和四个全剖面跟踪任务（旋转、密度、温度和压力）的基线比较。评估环境背后的动力学函数基于真实托卡马克DIII-D的历史放电数据构建。我们在统一协议下评估了广泛的模仿学习和离线RL基线。我们发现，基于模型的离线RL方法在大多数目标上获得了最佳平均性能，尽管没有单一方法在所有任务中占主导地位，这突显了动力学建模在复杂、长时域等离子体控制任务中的重要性。为了促进进一步研究，我们开源了代码库、数据集和评估框架，不仅为聚变社区，也为离线RL的算法开发提供了一个基准。

英文摘要

Offline reinforcement learning (RL) offers a promising route for developing plasma controllers from historical tokamak data, since online trial-and-error on real devices is costly and risky. However, progress in this direction remains difficult to measure due to the lack of a standardized offline RL benchmark for realistic multi-actuator, long-horizon plasma control problems in nuclear fusion. We introduce RL4F, an Offline Reinforcement Learning Benchmark for Plasma Control in Nuclear Fusion, providing closed-loop evaluation environments and baseline comparisons across four full-profile tracking tasks: rotation, density, temperature, and pressure. The dynamics function underlying the evaluation environment is built from historical discharge data from DIII-D, a real-world Tokamak. We evaluate a broad set of imitation learning and offline RL baselines under a unified protocol. We find that offline model-based RL methods obtain the best average performance on most objectives, although no single method dominates all tasks, highlighting the importance of dynamics modeling in complex, long-horizon plasma control tasks. To foster further research, we open-source the codebase, datasets, and evaluation framework, providing a benchmark not only for the fusion community but also for algorithm development in offline RL.

URL PDF HTML ☆

赞 0 踩 0

2606.07549 2026-06-09 cs.AI cs.MA 新提交

PathoSage: Towards Multi-Source Evidence Adjudication in Pathology via Experience-Aware Agentic Workflow

PathoSage：通过经验感知的代理工作流实现病理学多源证据裁决

Chengyang Zhang, Wenchuan Zhang, Bo Li, Mengran Li, Bob Zhang, Yuhao Yi, Hong Bu, Jiancheng Lv

发表机构 * College of Computer Science, Sichuan University（四川大学计算机科学学院）； Department of Pathology and Institute of Clinical Pathology, West China Hospital, Sichuan University（四川大学华西医院病理科/临床病理研究所）； Department of Computer and Information Science, University of Macau（澳门大学计算机与信息科学系）； School of Intelligent Systems Engineering, Sun Yat-sen University（中山大学智能工程学院）

AI总结提出PathoSage框架，通过结构化证据审议和Beta-Bernoulli经验系统，独立评估工具证据并解决冲突，减少幻觉和分类器分歧，提升病理学推理鲁棒性。

详情

AI中文摘要

多模态大语言模型（MLLMs）和代理工作流的最新进展在计算病理学中显示出巨大潜力，但可靠的补丁级推理仍然具有挑战性。端到端的病理学MLLM常常幻觉形态特征，而最近的代理系统通常将工具输出和检索知识合并到共享上下文中，使得决策容易受到冲突证据和上下文污染的影响。我们提出PathoSage，一个三阶段框架，明确分离知识检索、证据收集和证据裁决，用于补丁级病理学多模态推理。其核心组件结构化证据审议独立评估来自工具的异质证据，执行冲突分析，并在全新上下文中生成最终判断，以减少锚定偏差。我们进一步引入一个无需训练的Beta-Bernoulli经验系统，具有连续信用分配，以建模长期工具可靠性，并为未来工具使用构建相似性加权先验。实验表明，PathoSage有效缓解了VQA幻觉和分类器分歧，优于强病理学MLLM和代理基线。我们的结果强调了明确的证据裁决和可靠性感知工具建模是构建鲁棒病理学代理的关键要素。

英文摘要

Recent advances in Multimodal Large Language Models (MLLMs) and agent workflows have shown strong promise for computational pathology, yet reliable patch-level reasoning remains challenging. End-to-end pathology MLLMs often hallucinate morphological features, while recent agentic systems usually merge tool outputs and retrieved knowledge into a shared context, making decisions vulnerable to conflicting evidence and context contamination. We propose PathoSage, a three-stage framework that explicitly separates knowledge retrieval, evidence collection, and evidence adjudication for patch-level pathology multimodal reasoning. Its core component, Structured Evidence Deliberation, independently evaluates heterogeneous evidence from tools, performs conflict analysis, and generates the final judgment in a fresh context to reduce anchoring bias. We further introduce a training-free Beta-Bernoulli experience system with continuous credit assignment to model long-term tool reliability and construct similarity-weighted priors for future tool use. Experiments show that PathoSage effectively mitigates VQA hallucinations and classifier disagreement, outperforming strong pathology MLLM and agentic baselines. Our results highlight explicit evidence adjudication and reliability-aware tool modeling as key ingredients for robust pathology agents.

URL PDF HTML ☆

赞 0 踩 0

2606.07547 2026-06-09 cs.CL cs.AI cs.SD 新提交

Liberating LLM Capabilities in Full-Duplex Speech Models

在全双工语音模型中释放LLM能力

Luoyuan Zhang, Bokai Xu, Junbo Cui, Weiyue Sun, Yingjing Xu, Hanyu Liu, Yuan Yao

发表机构 * Royal Zhang（皇家张）

AI总结提出Listen-Write-Speak (LWS)三通道范式，使LLM在共享因果注意力上下文中同时监听、书写可见文本并实时口语回应，无需架构修改，实现全双工交互。

详情

AI中文摘要

后训练是（大规模）监督学习

Michael Hassid, Yossi Adi, Roy Schwartz

发表机构 * FAIR, Meta AI（Meta AI 基础人工智能研究团队）； The Hebrew University of Jerusalem（耶路撒冷希伯来大学）

AI总结本文论证当前LLM后训练阶段（SFT+RL）实质是回归到BERT时代的“预训练-微调”范式，通过实验表明从零开始后训练的模型也能取得显著性能，并提出应转向“学会学习”的训练方式。

详情

AI中文摘要

训练LLM的主流范式已演变为依赖包含SFT和RL的大规模后训练阶段。在这篇立场论文中，我们认为这种方法实际上标志着回归到BERT时代的“预训练然后微调”方法，明确地使模型适应期望的行为和评估所用的特定基准。我们首先回顾LLM的历史，描述LLM演化的不同阶段。我们认为当前格局与LLM早期惊人地相似，那时任务性能严重依赖于将模型拟合到分布内数据集。为了实证证明这一点，我们比较了预训练模型和随机初始化模型，在现代推理数据集上对两种变体进行微调，并在竞争性数学和代码基准上评估它们。我们表明，从头开始后训练的模型产生了高度非平凡的性能。我们的发现表明，当前的后训练方法主要作为分布拟合机制发挥作用。最后，我们提出，开发通用能力的模型和系统需要超越针对预定义行为的广泛后训练，转而采用模型“学会如何学习”的训练过程。

英文摘要

The prevailing paradigm for training LLMs has evolved to rely on a massive post-training phase consisting of SFT and RL. In this position paper, we argue that this methodology effectively marks a reversion to the ``pre-train then fine-tune'' approach of the BERT era, explicitly tailoring models to the desired behaviors and specific benchmarks on which they are evaluated. We begin with a historical overview of LLMs, describing the different phases of the LLM evolution. We argue that the current landscape is remarkably similar to the early days of LLMs, where task performance heavily relied on fitting the models to in-distribution datasets. To empirically demonstrate this, we compare pre-trained models to randomly initialized ones, by fine-tuning both variants on modern reasoning datasets and evaluating them on competitive math and code benchmarks. We show that models post-trained from scratch yield highly non-trivial performance. Our findings suggest that current post-training methodologies function primarily as a distribution-fitting mechanism. We finish by positing that developing generally capable models and systems requires moving beyond extensive post-training for predefined behaviors, shifting instead toward training procedures where models ``learn how to learn''.

URL PDF HTML ☆

赞 0 踩 0

2606.07526 2026-06-09 cs.CL cs.AI 新提交

GraphLoRA: Structure-Aware Low-Rank Adaptation for Large Language Model Recommendation

GraphLoRA: 面向大语言模型推荐的结构感知低秩适配

Lin Mu, Guoji Wang, Li Ni, Lei Sang, Zhize Wu, Peiquan Jin, Yiwen Zhang

发表机构 * Anhui University（安徽大学）； Hefei University（合肥大学）； University of Science and Technology of China（中国科学技术大学）

AI总结提出GraphLoRA框架，通过在低秩适配路径中嵌入可训练的图消息传递网络，实现结构信号传播，从而深度融合图结构与文本语义，提升LLM推荐性能。

Comments ACL 2026 findings

详情

AI中文摘要

大型语言模型（LLM）因其强大的推理和泛化能力，在推荐任务（LLMRec）中展现出巨大潜力。然而，如何有效对齐LLM建模的文本语义与协同信号仍是一个关键挑战。现有方法要么将协同信息转化为文本提示，要么将预训练嵌入注入LLM，两者都将结构信息视为静态输入，无法捕获高阶关系依赖。为弥合这一差距，我们提出GraphLoRA，一种新颖的框架，将低秩适配从独立传播推广到结构感知传播。GraphLoRA在低秩适配路径中嵌入一个可训练的图消息传递网络，使结构信号能够在参数空间中传播。该设计允许协同拓扑显式指导参数更新，促进图结构与文本语义信息的深度融合。在多个基准上的大量实验表明，GraphLoRA不仅优于最先进的基于LLM的推荐方法，而且实现了卓越的泛化能力，有效平衡了结构推理能力与计算效率。代码可在https://github.com/wgj15965/GraphLoRA获取。

英文摘要

Large Language Models (LLMs) have shown strong potential for recommendation (LLMRec) due to their powerful reasoning and generalization abilities. However, effectively aligning the textual semantics modeled by LLMs with the collaborative signals remains a key challenge. Existing methods either translate collaborative information into textual prompts or inject pre-trained embeddings into the LLM, both of which treat structural information as static input and fail to capture high-order relational dependencies. To bridge this gap, we propose GraphLoRA, a novel framework that generalizes low-rank adaptation from independent to structure-aware propagation. GraphLoRA embeds a trainable graph message-passing network within the low-rank adaptation pathway, enabling structural signals to propagate through the parameter space. This design allows collaborative topology to explicitly guide parameter updates, fostering deep integration between graph-structured and textual semantic information. Extensive experiments on multiple benchmarks demonstrate that GraphLoRA not only outperforms state-of-the-art LLM-based recommendation methods but also achieves superior generalization, effectively balancing structural reasoning capability with computational efficiency. Code is available at \href{https://github.com/wgj15965/GraphLoRA}{https://github.com/wgj15965/GraphLoRA}.

URL PDF HTML ☆

赞 0 踩 0

AI 大模型

视觉与机器人

科学与医疗

Optimality of Sequential Filtering Under Independent Cost and Selectivity Models

The Routing Plateau: Understanding and Breaking the Accuracy Limits of LLM Routers

Multimodal Group Emotion Recognition In-the-Wild Towards a Privacy-Safe Non-Individual Approach

Outage Detection in Self-Healing Smart Grids Using Reinforcement Learning with Spectral Graph Neural Networks

Customer Churn Prediction on Structured Data Using FT-Transformer and Stacking Ensembles

Training-Inference Kernel Contracts: Bounding Divergence in Post-Training and Deployment

MST-Direct at Scale: Multivariate and Conditional Geostatistical Simulation via Sinkhorn Optimal Transport

OmniMem: Perturbation-aware Memory Compression for Streaming Audio-Visual LLMs

When Should an AI Scientist Stop? Verifiable Experiment Steering and Refusal for Autonomous Discovery

Enabling KV Caching of Shared Prefix for Diffusion Language Models

TriHead-GAN: A Generative Adversarial Network with Triple-Head Discriminator for Carbon Emission Time Series Generation

STARIXNet: Multivariate and Multi-attribute Deep Learning Approach to Real-Time Resource Allocation in Cloud Platforms

Emergence via Phase Transitions: Mechanism Landscapes and Universal Convergence Across Complex Systems

Boundary Variance Inflation Causes Acquisition Bias in Gaussian Processes

Function-Vector Heads Are Two Populations: Writers and Cancellers in In-Context Learning

Phantom transitions in language model fine-tuning

Page image classifier fine-tuned on century-spanning archives of scanned documents for further content-specific processing

SPIN: Decentralized Swarm Control via Tensorized Policy Coordination

MedicalRec: Medical recommender system for image classification without retraining

Offline Reinforcement Learning for Plasma Control in Nuclear Fusion: Codebase and Benchmark

PathoSage: Towards Multi-Source Evidence Adjudication in Pathology via Experience-Aware Agentic Workflow

Liberating LLM Capabilities in Full-Duplex Speech Models

Finding Hidden Relationships Between Medical Concepts by Leveraging Metamap and Text Mining Techniques

Multilingual Refusal Alignment for Safer Large Language Models

Bridging Traditional Explainability Methods and Multimodal Multilingual Models: An XAI-Based Analysis

mllm-shap: A Shapley Value Explainability Platform for Text-Audio Multimodal Large Language Models

CAPruner: Conceptual-Adjacent Scene Graph Pruner for Enhancing 3D Spatial Reasoning of Large Language Models

BEACON: Behavioral Entropy Aggregation for Cross-Model Hallucination Detection in Large Language Models

Post-training is (Massive) Supervised Learning

GraphLoRA: Structure-Aware Low-Rank Adaptation for Large Language Model Recommendation