SMDP中平均奖励强化学习的调和均值公式

Erel Shtossel, Alicia Vidler, Uri Shaham, Gal A. Kaminka

AI总结针对无限时域非回合制任务中的平均奖励强化学习，提出一种修正的调和均值算子，解决SMDP中奖励和持续时间非平稳时的奖励率计算问题，并证明其理论性质及有效性。

详情

Journal ref: https://alaworkshop2026.github.io/papers/ALA2026_paper_57.pdf

AI中文摘要

最近的研究重新激发并增强了对无限时域、非回合制（持续）任务中未折扣平均奖励强化学习算法的兴趣。半马尔可夫决策过程（SMDP）尤其引人关注。在SMDP中，离散动作随机产生奖励和持续时间，目标是优化平均奖励率。现有算法通过优化奖励与持续时间的比率来逼近这一目标。然而，当奖励和持续时间（在无限时域中）非平稳时，这种方法可能不正确。本文提出一种新颖的修正调和均值算子，即使在上述条件下也能正确计算奖励率。这产生了可以与SMDP一起工作的无模型学习算法，同时保持对随时间变化的非平稳奖励和持续时间分布的鲁棒性。我们证明了修正调和均值算子的理论性质，并通过实验与现有算法相比展示了其有效性。

英文摘要

Recent research has revived and amplified interest in algorithms for undiscounted average reward reinforcement learning in infinite-horizon, non-episodic (continuing) tasks. Semi-Markov decision processes (SMDPs) are of particular interest. In SMDPs, discrete actions stochastically generate both rewards and durations, and the objective is to optimize the average reward rate. Existing algorithms approach this by optimizing the ratio of rewards to durations. However, when rewards and durations are non-stationary (in the infinite horizon), this can be incorrect. This paper presents a novel modified harmonic mean operator that correctly computes reward rates even under such conditions. This yields model-free learning algorithms that can work with SMDPs, while maintaining robustness to non-stationary reward and duration distributions over time. We prove theoretical properties of the modified harmonic mean operator, and empirically demonstrate its efficacy in comparison to existing algorithms.

URL PDF HTML ☆

赞 0 踩 0

2605.02207 2026-05-27 cs.CV cs.AI cs.LG

MultiSense-Pneumo: A Multimodal Learning Framework for Pneumonia Screening in Resource-Constrained Settings

MultiSense-Pneumo：面向资源受限环境中肺炎筛查的多模态学习框架

Dineth Jayakody, Pasindu Thenahandi, Chameli Dommanige

AI总结提出MultiSense-Pneumo多模态原型系统，整合症状、咳嗽音频、语音和胸片，通过可解释的后期融合实现肺炎筛查与分诊支持。

详情

AI中文摘要

肺炎仍然是全球发病率和死亡率的主要原因，尤其是在低资源环境中，那里缺乏影像学、实验室检测和专科护理。临床评估依赖于异质性证据，包括症状、呼吸模式、口头描述和胸部影像，使得一线筛查本质上是多模态的。然而，许多现有的计算方法仍然是单模态的，并且主要关注放射影像。在这项工作中，我们提出了MultiSense-Pneumo，一个面向肺炎筛查和分诊支持的多模态研究原型，它整合了结构化症状描述符、咳嗽音频、口语和胸部X光片。该系统结合了确定性症状分诊、基于LightGBM的声学分类、使用ResNet-18的域对抗放射影像分析、基于Transformer的语音识别以及可解释的后期融合算子。每个模态被转换为归一化的关注信号，并聚合为统一的筛查估计。融合权重是手动指定的，被视为启发式、可解释的参数，而不是学习或临床优化的值。MultiSense-Pneumo的设计考虑了在标准笔记本电脑级硬件上的离线执行，但并未作为经过部署验证或临床验证的诊断系统呈现。实验结果表明，在合成域偏移下，放射影像路径具有强大的组件级性能，同时也突出了重要的局限性，特别是咳嗽声学的异常类别召回率降低以及缺乏配对的端到端多模态患者评估。因此，MultiSense-Pneumo旨在作为筛查和分诊研究的框架和组件级原型。

英文摘要

Pneumonia remains a leading global cause of morbidity and mortality, particularly in low-resource settings where access to imaging, laboratory testing, and specialist care is limited. Clinical assessment relies on heterogeneous evidence, including symptoms, respiratory patterns, spoken descriptions, and chest imaging, making frontline screening inherently multimodal. However, many existing computational approaches remain unimodal and focus primarily on radiographs. In this work, we present MultiSense-Pneumo, a multimodal research prototype for pneumonia-oriented screening and triage support that integrates structured symptom descriptors, cough audio, spoken language, and chest radiographs. The system combines deterministic symptom triage, LightGBM-based acoustic classification, domain-adversarial radiograph analysis using ResNet-18, transformer-based speech recognition, and an interpretable late-fusion operator. Each modality is transformed into a normalized concern signal and aggregated into a unified screening estimate. The fusion weights are hand-specified and are treated as heuristic, interpretable parameters rather than learned or clinically optimized values. MultiSense-Pneumo is implemented with offline execution in mind on standard laptop-class hardware, but it is not presented as a deployment-validated or clinically validated diagnostic system. Experimental results demonstrate strong component-level performance of the radiograph pathway under synthetic domain shifts, while also highlighting important limitations, especially reduced abnormal-class recall for cough acoustics and the absence of paired end-to-end multimodal patient evaluation. MultiSense-Pneumo is therefore intended as a framework and component-level prototype for screening and triage research.

URL PDF HTML ☆

赞 0 踩 0

2605.08146 2026-05-27 cs.CV cs.AI

VT-Bench: A Unified Benchmark for Visual-Tabular Multi-Modal Learning

VT-Bench：视觉-表格多模态学习的统一基准

Zi-Yi Jia, Zi-Jian Cheng, Xin-Yue Zhang, Kun-Yang Yu, Zhi Zhou, Yu-Feng Li, Lan-Zhe Guo

AI总结提出首个视觉-表格多模态基准VT-Bench，涵盖9个领域14个数据集，评估23个模型，揭示视觉-表格学习的挑战。

2511.19741 2026-05-27 cs.CV

RAVE: 重新分配大型多模态模型中的视觉注意力

Xi Leng, Xinhong Ma, Ziqiang Dong, Feng Zhang, Xiaoying Tang, Yang Yang, Guanjun Jiang

AI总结针对大型多模态模型中标准注意力机制存在的跨模态误分配和视觉内不平衡问题，提出轻量级成对门控机制RAVE，通过学习查询-键偏置重新分配视觉注意力，在多个多模态基准上平均提升3个百分点，尤其对感知密集型任务效果显著。

2605.17774 2026-05-27 cs.CL

Internalizing Tool Knowledge in Small Language Models via QLoRA Fine-Tuning

通过QLoRA微调将工具知识内化到小型语言模型中

Yuval Shemla, Ayal Yakobe, Tanmay Agarwal, Dhaval Patel, Kaoutar El Maghraoui

AI总结本文研究通过QLoRA参数高效微调将工具知识内化到小型语言模型中，在AssetOpsBench基准上，微调后的Gemma 4 E4B和Qwen3-4B模型在无描述推理下优于有完整工具描述的未微调基线，输入长度减少82.6%，规划分数提升。

详情

AI中文摘要

大型语言模型越来越多地被用作代理系统中的规划组件，但当前的工具使用流程通常需要将完整的工具模式包含在每个提示中，这产生了大量的令牌开销，并限制了较小模型的实用性。本文研究了是否可以通过参数高效微调将工具使用知识内化到小型语言模型中，从而在推理时无需显式的工具描述即可进行结构化规划。使用AssetOpsBench作为主要基准，我们使用8位QLoRA在约1700个工具使用示例上微调了Gemma 4 E4B和Qwen3-4B，这些示例涵盖工具知识、问题到规划的映射以及执行风格的轨迹。我们在无描述推理下评估了生成的模型，其中提示完全省略了工具目录。微调后的模型优于接收完整工具描述的有信息未微调基线，输入长度减少了82.6%，同时提高了结构性和LLM评判的规划分数。在最佳的Gemma运行中，模型达到了0.65的AT-F1和3.88的整体评判分数，而信息基线的分数分别为0.47和2.88。Qwen3-4B达到了3.78的强劲整体评判分数，同时使用的内存比Gemma少62%，运行速度快2.5倍，尽管它在一般多项选择基准上也表现出更大的灾难性遗忘。额外的消融实验表明，LoRA秩控制着质量与保留之间的权衡，其中$r=32$最大化规划质量，而较小的秩保留了更多的一般知识。这些结果表明，对于固定的工具目录，QLoRA微调可以将工具知识从提示上下文转移到模型权重中，从而在保持或提高工具规划质量的同时，大幅减少推理开销。

英文摘要

Large language models are increasingly used as planning components in agentic systems, but current tool-use pipelines often require full tool schemas to be included in every prompt, creating substantial token overhead and limiting the practicality of smaller models. This paper investigates whether tool-use knowledge can be internalized into small language models through parameter-efficient fine-tuning, enabling structured planning without explicit tool descriptions at inference time. Using AssetOpsBench as the primary benchmark, we fine-tune Gemma 4 E4B and Qwen3-4B with 8-bit QLoRA on approximately 1,700 tool-use examples spanning tool knowledge, question-to-plan mappings, and execution-style traces. We evaluate the resulting models under description-free inference, where the prompt omits the tool catalog entirely. The fine-tuned models outperform an informed unfine-tuned baseline that receives full tool descriptions, reducing input length by 82.6\% while improving structural and LLM-judge planning scores. In the best Gemma run, the model achieves an AT-F1 of 0.65 and an overall judge score of 3.88, compared with 0.47 and 2.88 for the informed baseline. Qwen3-4B achieves a strong overall judge score of 3.78 while using 62\% less memory and running 2.5$\times$ faster than Gemma, though it also exhibits greater catastrophic forgetting on general multiple-choice benchmarks. Additional ablations show that LoRA rank controls a quality--retention trade-off, with $r=32$ maximizing planning quality and smaller ranks preserving more general knowledge. These results suggest that, for fixed tool catalogs, QLoRA fine-tuning can shift tool knowledge from prompt context into model weights, substantially reducing inference overhead while maintaining or improving tool-planning quality.

URL PDF HTML ☆

赞 0 踩 0

2605.17617 2026-05-27 cs.AI

动态对抗微调重组拒绝几何结构

Wenhao Lan, Shan Li, Xinhua Lai, Meiqi Wu, Junbin Yang, Haihua Shen, Yijun Yang

AI总结研究动态对抗微调如何改变安全对齐语言模型中拒绝行为的因果控制载体（低维子空间），发现R2D2沿鲁棒性-效用前沿重组几何结构但未建立自适应鲁棒性。

详情

AI中文摘要

安全对齐的语言模型必须拒绝有害请求而不广泛过度拒绝，但尚不清楚动态对抗微调如何改变拒绝控制载体：Kullback--Leibler (KL)约束方向或因果调节拒绝而不引起大规模安全提示分布偏移的小子空间。我们研究了一个7B骨干模型在监督微调（SFT）和鲁棒拒绝动态防御（R2D2）下的表现，将HarmBench、StrongREJECT和XSTest评估与五点几何测量、因果干预和稀疏自适应压力测试对齐。R2D2在早期检查点将固定源HarmBench攻击成功率降至零；然而，这些检查点也表现出最大的XSTest拒绝率并未能通过良性效用审计。后期检查点部分恢复了面向效用的行为，同时重新打开了攻击成功率，自适应GCG攻击成功率在第250步升至0.415，第500步升至0.613。内部地，R2D2在第100步之前保留了一个后期层的可接受拒绝控制载体，然后将最佳可接受载体迁移到早期层；SFT迁移更早但鲁棒性较差。有效秩保持在1.24附近，SFT表现出更大的主角漂移，这反对将维度扩展和漂移幅度作为充分解释。因果干预支持一个低维但效用耦合的载体。这些结果支持R2D2沿鲁棒性-效用前沿的几何重组解释，但未建立自适应鲁棒性。

英文摘要

Safety-aligned language models must refuse harmful requests without broad over-refusal, but it remains unclear how dynamic adversarial fine-tuning changes refusal-control carriers: Kullback--Leibler (KL)-constrained directions or small subspaces that causally modulate refusal without large safe-prompt distribution shifts. We study a 7B backbone under supervised fine-tuning (SFT) and Robust Refusal Dynamic Defense (R2D2), aligning HarmBench, StrongREJECT, and XSTest evaluations with five-anchor geometry measurements, causal interventions, and sparse adaptive stress tests. R2D2 drives fixed-source HarmBench attack success to zero at early checkpoints; however, these checkpoints also exhibit maximal XSTest refusal and fail a benign-utility audit. Later checkpoints partially recover utility-facing behavior while reopening attack success, with adaptive GCG attack success rate rising to 0.415 at step 250 and 0.613 at step 500. Internally, R2D2 preserves a late-layer admissible refusal-control carrier through step 100 and then relocates the best admissible carrier to an early layer; SFT relocates earlier yet remains less robust. Effective rank stays near 1.24, and SFT shows larger principal-angle drift, arguing against both dimensional expansion and drift magnitude as sufficient explanations. Causal interventions support a low-dimensional but utility-coupled carrier. These results support a geometry-reorganization account of R2D2 along a robustness--utility frontier, without establishing adaptive robustness.

URL PDF HTML ☆

赞 0 踩 0

2601.15891 2026-05-27 cs.CV

视觉Mamba能否提升AI生成图像检测？一项深入研究

Mamadou Keita, Wassim Hamidouche, Hessen Bougueffa Eutamene, Abdelmalik Taleb-Ahmed, Xianxun Zhu, Abdenour Hadid

AI总结本研究系统评估了Vision Mamba模型在AI生成图像检测中的性能，与CNN、ViT和VLM检测器进行对比，分析了准确性、效率和泛化能力。

详情

AI中文摘要

近年来，计算机视觉取得了显著进展，这得益于卷积神经网络（CNN）、生成对抗网络（GAN）、扩散架构、视觉Transformer（ViT）以及最近的视觉-语言模型（VLM）等创新架构的发展。这一进展无疑有助于创造越来越逼真和多样化的视觉内容。然而，图像生成的这些进步也引发了对错误信息、身份盗窃以及隐私和安全威胁等潜在滥用的担忧。与此同时，基于Mamba的架构已成为这一快速发展的领域中一系列图像分析任务（包括分类、分割、医学成像、目标检测和图像恢复）的多功能工具。然而，与已有技术相比，它们在识别AI生成图像方面的潜力仍相对未被探索。本研究提供了用于AI生成图像检测的Vision Mamba模型的系统评估和比较分析。我们在多样化的数据集和合成图像源上，将多个Vision Mamba变体与代表性的CNN、ViT和基于VLM的检测器进行基准测试，重点关注准确性、效率以及跨不同图像类型和生成模型的泛化能力等关键指标。通过这一全面分析，我们旨在阐明Vision Mamba相对于已有方法在检测AI生成图像方面的适用性、准确性和效率上的优势与局限性。总体而言，我们的研究结果突显了Vision Mamba作为区分真实与AI生成视觉内容的系统组件的潜力和当前局限性。这项研究对于在区分真实与AI生成内容成为重大挑战的时代提升检测能力至关重要。

英文摘要

In recent years, computer vision has witnessed remarkable progress, fueled by the development of innovative architectures such as Convolutional Neural Networks (CNNs), Generative Adversarial Networks (GANs), diffusion-based architectures, Vision Transformers (ViTs), and, more recently, Vision-Language Models (VLMs). This progress has undeniably contributed to creating increasingly realistic and diverse visual content. However, such advancements in image generation also raise concerns about potential misuse in areas such as misinformation, identity theft, and threats to privacy and security. In parallel, Mamba-based architectures have emerged as versatile tools for a range of image analysis tasks, including classification, segmentation, medical imaging, object detection, and image restoration, in this rapidly evolving field. However, their potential for identifying AI-generated images remains relatively unexplored compared to established techniques. This study provides a systematic evaluation and comparative analysis of Vision Mamba models for AI-generated image detection. We benchmark multiple Vision Mamba variants against representative CNNs, ViTs, and VLM-based detectors across diverse datasets and synthetic image sources, focusing on key metrics such as accuracy, efficiency, and generalizability across diverse image types and generative models. Through this comprehensive analysis, we aim to elucidate Vision Mamba's strengths and limitations relative to established methodologies in terms of applicability, accuracy, and efficiency in detecting AI-generated images. Overall, our findings highlight both the promise and current limitations of Vision Mamba as a component in systems designed to distinguish authentic from AI-generated visual content. This research is crucial for enhancing detection in an age where distinguishing between real and AI-generated content is a major challenge.

URL PDF HTML ☆

赞 0 踩 0

2605.14664 2026-05-27 cs.CV

MiVE: Multiscale Vision-language features for reference-guided video Editing

MiVE：用于参考引导视频编辑的多尺度视觉语言特征

Tong Wang, Meng Zou, Chengjing Wu, Xiaochao Qu, Luoqi Liu, Xiaolin Hu, Ting Liu

AI总结提出MiVE框架，利用VLM的多尺度层次特征（早期层保留空间细节，深层编码全局语义）统一到自注意力扩散Transformer中，解决模态间隙和细粒度信息丢失问题，在参考引导视频编辑中达到SOTA性能。

Comments ICML 2026

详情

AI中文摘要

参考引导视频编辑以源视频、文本指令和参考图像作为输入，要求模型在忠实执行指令编辑的同时保留原始运动及未编辑内容。现有方法分为两种范式，各有固有限制：解耦编码器在处理指令和视觉内容时存在模态间隙，而统一视觉语言编码器仅依赖最终层表示，丢失了细粒度空间细节。我们观察到VLM层层次化地编码互补信息——早期层捕获局部空间细节，对精确编辑至关重要；深层编码全局语义，用于指令理解。基于此洞察，我们提出MiVE（用于参考引导视频编辑的多尺度视觉语言特征），该框架将VLM重新用作多尺度特征提取器。MiVE从Qwen3-VL提取层次特征，并将其集成到统一的自注意力扩散Transformer中，消除了交叉注意力设计中固有的模态不匹配。实验表明，MiVE在人类偏好中排名最高，性能优于学术方法和商业系统，达到了最先进水平。

英文摘要

Reference-guided video editing takes a source video, a text instruction, and a reference image as inputs, requiring the model to faithfully apply the instructed edits while preserving original motion and unedited content. Existing methods fall into two paradigms, each with inherent limitations: decoupled encoders suffer from modality gaps when processing instructions and visual content independently, while unified vision-language encoders lose fine-grained spatial details by relying solely on final-layer representations. We observe that VLM layers encode complementary information hierarchically -- early layers capture localized spatial details essential for precise editing, while deeper layers encode global semantics for instruction comprehension. Building on this insight, we present MiVE (Multiscale Vision-language features for reference-guided video Editing), a framework that repurposes VLMs as multiscale feature extractors. MiVE extracts hierarchical features from Qwen3-VL and integrates them into a unified self-attention Diffusion Transformer, eliminating the modality mismatch inherent in cross-attention designs. Experiments demonstrate that MiVE achieves state-of-the-art performance by ranking highest in human preference, outperforming both academic methods and commercial systems.

URL PDF HTML ☆

赞 0 踩 0

2605.14480 2026-05-27 cs.CL

Cross-Linguistic Transcription and Phonological Representation in the Huìtóngguǎnxì Huáyíyìyǔ

《会同馆华夷译语》中的跨语言转写与音系表征

Ji-eun Kim

AI总结本研究将《会同馆华夷译语》视为一个连贯的多语言转写系统，通过数字化和音系分析，揭示了其主要转写和补充转写的跨语言规律，并论证了该系统作为历史音系证据的价值。

Comments 49 pages; 1 figure; 40 tables; SLE2019; under review

详情

AI中文摘要

目的：本研究调查《会同馆华夷译语》（HHY）的转写原则，该系列多语词汇集由明朝政府在15至16世纪间编纂，用于译员培训。本研究不将HHY视为孤立语言材料的集合，而是将其视为一个连贯的多语言转写系统，通过汉字表征非汉语语言的口语形式。方法：将HHY的绝大部分数字化，并与汉语音韵范畴对齐。对先前各语言部分的重建进行批判性审查，并整合到一个统一的比较数据库中。分析聚焦于八个语言部分中主要转写（MT）和补充转写（ST）的跨语言规律。结果：MT通常表征与当时汉语音节结构兼容的音，而ST主要编码与汉语音系兼容性较差的语音特征。分析进一步表明，汉语音韵范畴在外语转写中的使用比先前假设的更为灵活。因此，HHY作为一种相对系统的语音近似方法，而非汉语音系对非汉语语言的直接投射。结论：HHY可被分析为一个内部结构化的转写系统，而不仅仅是词汇集的集合。更广泛地说，该研究表明历史转写系统可为历史音系学提供宝贵证据，尤其对于历史记录有限的亚洲语言。

英文摘要

Purpose: This study investigates the transcription principles underlying Huìtóngguǎnxì Huáyíyìyǔ (HHY), a series of multilingual glossaries compiled by the Ming government between the fifteenth and sixteenth centuries for interpreter training. The study treats HHY not as a collection of isolated language materials, but as a coherent multilingual transcription system representing spoken forms of non-Chinese languages through Chinese characters. Methods: A substantial portion of HHY was digitized and aligned with Chinese phonological categories. Previous reconstructions of individual language sections were critically reviewed and integrated into a unified comparative database. The analysis focuses on cross-linguistic regularities in Main Transcription (MT) and Supplementary Transcription (ST) across eight language sections. Results: MT generally represents sounds compatible with the Chinese syllable structure of the period, whereas ST mainly encodes phonetic features less compatible with Chinese phonology. The analysis further shows that Chinese phonological categories were used more flexibly in foreign-language transcription than previously assumed. HHY therefore functioned as a relatively systematic method of phonetic approximation rather than a direct projection of Chinese phonology onto non-Chinese languages. Conclusion: HHY can be analyzed as an internally structured transcription system rather than merely as a collection of glossaries. More broadly, the study demonstrates that historical transcription systems can provide valuable evidence for historical phonology, particularly for under-documented Asian languages with limited historical records.

URL PDF HTML ☆

赞 0 踩 0

2605.13779 2026-05-27 cs.LG cs.AI cs.DC

MinT: Managed Infrastructure for Training and Serving Millions of LLMs

MinT：用于训练和服务数百万LLM的托管基础设施

Mind Lab, :, Song Cao, Vic Cao, Andrew Chen, Kaijie Chen, Cleon Cheng, Steven Chiang, Kaixuan Fan, Hera Feng, Huan Feng, Arthur Fu, Jun Gao, Hongquan Gu, Aaron Guan, Nolan Ho, Mutian Hong, Hailee Hou, Peixuan Hua, Charles Huang, Miles Jiang, Nora Jiang, Yuyi Jiang, Qiuyu Jin, Fancy Kong, Andrew Lei, Kyrie Lei, Alexy Li, Lucian Li, Ray Li, Theo Li, Zhihui Li, Jiayi Lin, Kairus Liu, Kieran Liu, Logan Liu, Xiang Liu, Irvine Lu, Maeve Luo, Runze Lv, Pony Ma, Verity Niu, Anson Qiu, Vincent Wang, Rio Yang, Maxwell Yao, Carrie Ye, Regis Ye, Wenlin Ye, Josh Ying, Danney Zeng, Yuhan Zhan, Anya Zhang, Di Zhang, Ruijia Zhang, Sueky Zhang, Ya Zhang, Wei Zhao, Ada Zhou, Changhai Zhou, Yuhua Zhou, Xinyue Zhu, Murphy Zhuang

AI总结提出MinT系统，通过LoRA适配器管理实现大规模基础模型上的高效训练与在线服务，支持百万级策略目录。

Comments 30 pages, technical report

详情

AI中文摘要

我们提出MindLab Toolkit (MinT)，一个用于低秩适配（LoRA）后训练和在线服务的托管基础设施系统。MinT针对这样一种场景：在少量昂贵的基模型部署上产生许多训练好的策略。MinT不是将每个策略实现为合并的完整检查点，而是保持基模型驻留，并通过回滚、更新、导出、评估、服务和回滚等阶段移动导出的LoRA适配器修订版，将分布式训练、服务、调度和数据移动隐藏在服务接口后面。MinT沿三个维度扩展此路径。Scale Up将LoRA RL扩展到前沿规模的密集和MoE架构，包括MLA和DSA注意力路径，训练和服务已验证超过1T总参数。Scale Down仅移动导出的LoRA适配器，在秩1设置中可小于基模型大小的1%；适配器仅移交将测量步骤在4B密集模型上减少18.3倍，在30B MoE上减少2.85倍，而并发多策略GRPO将挂钟时间缩短1.77倍和1.45倍，且不提高峰值内存。Scale Out将持久策略可寻址性与CPU/GPU工作集分离：张量并行部署支持10^6规模的可寻址目录（通过100K测量单引擎扫描）和集群规模的千适配器活动波，冷加载作为计划的服务工作处理，打包的MoE LoRA张量将实时引擎加载提高8.5-8.7倍。因此，MinT管理百万规模的LoRA策略目录，同时在共享的1T级基模型上训练和服务选定的适配器修订版。

英文摘要

We present MindLab Toolkit (MinT), a managed infrastructure system for Low-Rank Adaptation (LoRA) post-training and online serving. MinT targets a setting where many trained policies are produced over a small number of expensive base-model deployments. Instead of materializing each policy as a merged full checkpoint, MinT keeps the base model resident and moves exported LoRA adapter revisions through rollout, update, export, evaluation, serving, and rollback, hiding distributed training, serving, scheduling, and data movement behind a service interface. MinT scales this path along three axes. Scale Up extends LoRA RL to frontier-scale dense and MoE architectures, including MLA and DSA attention paths, with training and serving validated beyond 1T total parameters. Scale Down moves only the exported LoRA adapter, which can be under 1% of base-model size in rank-1 settings; adapter-only handoff reduces the measured step by 18.3x on a 4B dense model and 2.85x on a 30B MoE, while concurrent multi-policy GRPO shortens wall time by 1.77x and 1.45x without raising peak memory. Scale Out separates durable policy addressability from CPU/GPU working sets: a tensor-parallel deployment supports 10^6-scale addressable catalogs (measured single-engine sweeps through 100K) and thousand-adapter active waves at cluster scale, with cold loading treated as scheduled service work and packed MoE LoRA tensors improving live engine loading by 8.5-8.7x. MinT thus manages million-scale LoRA policy catalogs while training and serving selected adapter revisions over shared 1T-class base models.

URL PDF HTML ☆

赞 0 踩 0