arXivDaily arXiv每日学术速递 周一至周五更新
重置
2606.10662 2026-06-10 cs.MA cs.AI 新提交

Decentralized Multi-Agent Systems with Shared Context

具有共享上下文的去中心化多智能体系统

Yuzhen Mao, Azalia Mirhoseini

AI总结 提出DeLM框架,通过并行智能体、共享上下文和任务队列去中心化协调,解决集中式MAS的瓶颈,在软件工程和长上下文推理中提升性能并降低成本。

详情
AI中文摘要

多智能体系统(MAS)通过将复杂问题分解为并行子任务,可以在测试时扩展大型语言模型的推理能力。然而,大多数现有的MAS依赖于集中式编排,其中主智能体分配工作、收集输出并合并结果。随着子任务数量的增长,该控制器成为通信和集成瓶颈。我们提出了去中心化语言模型(DeLM),这是一种通过并行智能体、共享验证上下文和任务队列来去中心化协调的MAS框架。智能体异步认领子任务,读取累积进度,执行局部推理,并写回紧凑的验证更新。共享上下文充当公共通信基础,使智能体能够基于彼此的验证进度进行构建,而无需通过中央控制器路由每次更新。实验上,DeLM在软件工程测试时扩展和长上下文推理方面均有所改进。在SWE-bench Verified上,DeLM在Avg.@1、Pass@2和Pass@4上均取得了最佳性能,比最强基线高出多达10.5个百分点,同时每个任务的成本降低约50%。在LongBench-v2多文档问答上,DeLM在四个前沿模型系列中取得了最高平均准确率,比最强基线高出多达5.7个百分点。代码可在我们的项目网站(此 https URL)上获取。

英文摘要

Multi-agent systems (MAS) can scale large language model reasoning at test time by decomposing complex problems into parallel subtasks. However, most existing MAS rely on centralized orchestration, where a main agent assigns work, collects outputs, and merges results. As the number of subtasks grows, this controller becomes a communication and integration bottleneck. We propose Decentralized Language Models (DeLM), a MAS framework that decentralizes coordination through parallel agents, a shared verified context, and a task queue. Agents asynchronously claim subtasks, read accumulated progress, perform local reasoning, and write back compact verified updates. The shared context acts as a common communication substrate, enabling agents to build on one another's verified progress without routing every update through a central controller. Empirically, DeLM improves both software-engineering test-time scaling and long-context reasoning. On SWE-bench Verified, DeLM achieves the best performance across Avg.@1, Pass@2, and Pass@4, with gains of up to 10.5 percentage points over the strongest baseline, while reducing cost per task by roughly 50%. On LongBench-v2 Multi-Doc QA, DeLM achieves the highest average accuracy across four frontier model families, improving over the strongest baseline by up to 5.7 percentage points. The code is available on our project website at https://yuzhenmao.github.io/DeLM/.

2606.10660 2026-06-10 cs.CY cs.AI 新提交

Accounting for AI Inference in Corporate GHG Inventories: A Four-Tier Methodology for Scope 3 Category 1 Reporting

企业温室气体清单中AI推理的核算:范围3类别1报告的四层方法

Guillermo Llopis

AI总结 针对CSRD要求下AI推理服务在范围3类别1中缺乏标准核算方法的问题,提出基于token物理估算的四层框架,通过GPU能耗基准和区域电网碳强度精确估算排放,并揭示水碳权衡。

详情
Comments
Preprint. Data repository: https://doi.org/10.5281/zenodo.20443586. 18 pages, 3 figures, 6 tables
AI中文摘要

AI推理服务——API订阅、企业聊天工具和嵌入AI功能的SaaS产品——明确属于《企业可持续发展报告指令》(CSRD)下的范围3类别1,该指令要求自2024年1月开始的财年进行披露。然而,目前尚无标准方法将其纳入企业温室气体清单。现行实践要么完全忽略该类别,要么应用针对整个ICT行业校准的通用经济投入产出(EEIO)因子,导致AI推理排放被高估10-40倍(相对于物理衍生方法)。我们提出了一个四层框架,将估算精度与组织实际可获取的数据相匹配,从基于token的直接物理估算(使用GPU能耗基准和区域电网碳强度)逐步降级到基于支出的EEIO后备方法(用于无使用数据的服务)。排放因子来源于同行评审的GPU能耗基准(此http URL排行榜v3)、确认的电网碳强度(EPA eGRID 2023;Ember 2023)以及已发布的水利用效率数据(Li等人,2025)。应用于一家200人的欧洲企业,该框架得出的总排放量低于1 tCO2e,表明合规挑战在于方法论而非规模。我们进一步记录了当前ESG工具未揭示的水碳权衡:瑞典以水电为主的电网在数据集中碳强度最低,但水足迹最高,这对数据中心选址策略有直接影响。

英文摘要

AI inference services -- API subscriptions, enterprise chat tools, and SaaS products with embedded AI features -- fall unambiguously within Scope 3 Category 1 under the Corporate Sustainability Reporting Directive (CSRD), which requires disclosure for fiscal years starting January 2024. Yet no standardised methodology exists for including them in corporate GHG inventories. Current practice either omits the category entirely or applies a generic economic input-output (EEIO) factor calibrated to the ICT sector as a whole, overestimating AI inference emissions by 10-40x relative to physically derived alternatives. We propose a four-tier framework that matches estimation precision to the data organisations can realistically obtain, progressing from direct token-based physical estimation -- using GPU energy benchmarks and regional grid carbon intensities -- down to a spend-based EEIO fallback for services where no usage data exists. Emission factors are derived from peer-reviewed GPU energy benchmarks (ML.ENERGY Leaderboard v3), confirmed grid carbon intensities (EPA eGRID 2023; Ember 2023), and published water use effectiveness data (Li et al., 2025). Applied to a 200-person European firm, the framework yields a total below 1 tCO2e, illustrating that the compliance challenge is methodological rather than magnitude-driven. We further document a water-carbon trade-off that current ESG tools do not surface: Sweden's hydro-dominated grid delivers the lowest carbon intensity in our dataset but the highest water footprint, with direct implications for data centre location strategy.

2606.10658 2026-06-10 cs.CR cs.AI cs.CE q-fin.CP 新提交

Post-Quantum Secure Federated DeFi for Inclusive Banking

面向普惠银行的后量子安全联邦DeFi

Swati Sachan, Dale Fickett, Richard Buchinger, Theo Miller

AI总结 提出后量子安全联邦DeFi框架,利用格基全同态加密和NASA-IBM地理空间基础模型,实现银行间加密协作以提升信用不足个体的金融普惠性。

详情
AI中文摘要

近期纠错量子比特的进展加速了实用量子计算的时间表,这对用于保护金融系统、政府基础设施、通信网络和DeFi(去中心化金融)生态系统的密码原语构成威胁。本文提出一个后量子安全的联邦DeFi框架,支持银行间协作,以改善因有限金融历史而受到当地贷款机构服务不足的个体的普惠性。多家银行将加密信息批次贡献给一个虚拟服务器,其中基于格的完全同态加密(FHE)实现了端到端的同态计算。服务器以加密格式融合本地数据驱动的概率评估、专家信念以及由NASA-IBM Prithvi地理空间基础模型(GFM)生成的可验证证据。采用去中心化技术确保机构与服务器之间所有加密数据交换的防篡改证据和可审计问责性。该框架在弗吉尼亚州农村借款人的农业贷款决策上进行了测试。

英文摘要

Recent advances in error-corrected qubits have accelerated the timeline for practical quantum computing. It poses a threat to cryptographic primitives used to secure financial systems, government infrastructure, communication networks, and DeFi (Decentralized Finance) ecosystems. This paper introduces a post-quantum secure federated DeFi framework that enables inter-bank collaboration to improve the inclusivity of individuals underserved by local lenders due to limited financial histories. Multiple banks contribute encrypted information batches to a virtual server, where lattice-based Fully Homomorphic Encryption (FHE) enables end-to-end homomorphic computation. The server fuses local data-driven probabilistic assessments, expert beliefs, and verifiable evidence generated by the NASA-IBM Prithvi Geospatial Foundation Model (GFM), in encrypted format. Decentralized technologies are employed to ensure tamper-proof evidence and auditable accountability for all encrypted data exchanges between institutions and the server. The framework is tested on agricultural lending decisions for rural borrowers in Virginia.

2606.10627 2026-06-10 cs.HC cs.LG cs.SD 新提交

Profy: Interpretable Visualization of Expertise-Dependent Motor Skills Toward Supporting Piano Practice

Profy: 面向钢琴练习的、可解释的专业技能依赖性运动技能可视化

Kazuki Kawamura, Fujiki Nakamura, Hayato Nishioka, Momoko Shioki, Shinichi Furuya, Jun Rekimoto

AI总结 提出弱监督系统Profy,利用听众评分标签学习时间对齐的高亮,帮助钢琴学习者定位需重点练习的段落,在无局部标签下与专家标注高度一致。

详情
Comments
Designing Interactive Systems Conference (DIS '26), June 13-17, 2026, Singapore, Singapore
AI中文摘要

钢琴演奏的质量取决于微妙的时机、发音和动态控制,但练习反馈通常是基于总结的且难以付诸行动。我们介绍了Profy,一个弱监督系统,它从聚合听众评分(专家标记与业余标记)中学习片段级标签,生成时间对齐的高亮,用于钢琴练习中的回顾。我们收集了73名钢琴家的同步1 kHz键运动与音频数据,并使用1083个有效片段进行建模和评估。模型在共享的重采样模型时间基上输出片段级预测和证据分数以进行可视化。在21名专家钢琴家标注的20个业余短技术练习片段上,尽管训练时没有局部标签,显示的高亮分数与专家标记用于回顾的段落一致(Pearson r=0.61,ROC-AUC 0.75)。Profy不是用一个全局分数总结一个片段,而是通过支持与专家-业余差异相关的时间局部段落的擦洗、循环和聚焦回放,帮助学习者决定下一步检查哪里。

英文摘要

The quality of piano performance depends on nuanced timing, articulation, and dynamic control, but practice feedback is often summary-based and hard to act on. We introduce Profy, a weakly supervised system that learns from take-level labels derived from aggregated listener ratings (expert-labeled vs. amateur-labeled) to produce time-aligned highlights for review during piano practice. We collected synchronized 1 kHz key-motion and audio from 73 pianists and used 1,083 valid takes for modeling and evaluation. The model outputs clip-level predictions together with evidence scores on a shared resampled model time base for visualization. On 20 amateur clips from short technique studies annotated by 21 expert pianists, the displayed highlight score aligns with passages that expert pianists marked for review despite training without localized labels (Pearson r=0.61, ROC-AUC 0.75). Rather than summarizing a take with a single global score, Profy helps learners decide where to inspect next by supporting scrubbing, looping, and focused replay of time-localized passages associated with expert-amateur differences.

2606.10621 2026-06-10 cs.IR cs.AI 新提交

STORM: Stepwise Token Optimization with Reward-Guided Beam Search

STORM: 基于奖励引导束搜索的逐步令牌优化

Arthur Satouf, Giulio D'Erasmo, Yuxuan Zong, Habiboulaye Amadou Boubacar, Pablo Piantanida, Benjamin Piwowarski

AI总结 提出STORM框架,通过检索奖励引导的束搜索在每一步优化令牌选择,实现词汇检索的查询扩展,在多个基准上匹配或超越大模型重写器,并零样本迁移至18种语言。

详情
AI中文摘要

现代检索越来越依赖密集和学习的稀疏神经模型,这些模型有效但需要将整个语料库编码为专门的索引,并在模型变化时重建。像BM25这样的词汇检索器在标准倒排索引上保持高效和透明,无需随模型演变而改变,但存在词汇不匹配问题。LLM查询重写可以提供帮助,但提示式重写器会生成格式良好但检索无效或有害的术语,而针对检索奖励进行训练仅提供延迟的、序列级别的监督,掩盖了哪些术语有帮助。我们引入了STORM(基于奖励引导束搜索的逐步令牌优化),一个用于词汇查询扩展的自监督框架。STORM通过检索指标引导生成来训练重写器:在每一步,候选扩展根据BM25索引进行评分,并剪枝低奖励的延续,将检索奖励转化为令牌级别的信号,集中探索检索有效的词汇。在TREC DL和BEIR上,STORM使0.6B-8B的骨干模型匹配或超越有竞争力的LLM重写器,同时检索速度与普通BM25一样快;在8B规模上,它可与更大的专有重写器相媲美。它进一步零样本迁移到18种语言(MIRACL),平均击败了专门的多语言密集检索器,使STORM成为密集神经检索的一种有竞争力、基础设施轻量级的替代方案。

英文摘要

Modern retrieval increasingly relies on dense and learned-sparse neural models that are effective but require encoding the entire corpus into a specialized index, rebuilt whenever the model changes. Lexical retrievers like BM25 stay efficient and transparent on a standard inverted index that need not change as models evolve, but suffer from vocabulary mismatch. LLM query rewriting can help, yet prompted rewriters emit well-formed but retrieval-ineffective or harmful-terms, and training against a retrieval reward gives only delayed, sequence-level supervision that obscures which terms helped. We introduce STORM (Stepwise Token Optimization with Reward-guided beaM search), a self-supervised framework for lexical query expansion. STORM trains the rewriter through generation guided by retrieval metrics: at each step, candidate expansions are scored against the BM25 index and low-reward continuations pruned, turning the retrieval reward into a token-level signal that concentrates exploration on retrieval-effective vocabulary. Across TREC DL and BEIR, STORM lets 0.6B-8B backbones match or surpass competitive LLM rewriters while retrieving as fast as plain BM25; at 8B it rivals far larger proprietary rewriters. It further transfers zero-shot to 18 languages (MIRACL), beating dedicated multilingual dense retrievers on average, making STORM a competitive, infrastructure-light alternative to dense neural retrieval.

2606.10620 2026-06-10 cs.CV cs.AI 新提交

Can Image Models Imagine Time? ImageTime: A Novel Benchmark for Probing Visual World Modeling Through Spatiotemporal Consistency

图像模型能想象时间吗?ImageTime:通过时空一致性探究视觉世界建模的新基准

Xinrui Wu, Lichen Huang

AI总结 提出ImageTime基准,通过四关键帧协议(初始状态、动作开始、过渡状态、最终状态)评估图像生成模型在时空一致性上的表现,揭示模型在维持连贯视觉世界状态方面的能力与不足。

详情
AI中文摘要

图像生成模型现在能够生成高质量的静态图像,但它们表示视觉世界随时间变化的能力仍然知之甚少。实际工作流程如故事板、逐步插图、参考引导编辑和视频预可视化要求模型在多个视觉状态之间保持身份、对象、空间关系和因果顺序。现有评估主要衡量单图像正确性、组合对齐或视频质量,而未明确图像模型是否能连贯地想象一个时间有序的过程。我们引入ImageTime,一个诊断基准,使用时空一致性作为图像生成中视觉世界建模的行为探针。给定一个动作指令,以及可选地指定初始状态的参考图像,模型必须生成一张包含四个有序关键状态的图像:初始状态、动作开始、过渡状态和最终状态。这个四关键帧协议比单图像生成在时间上要求更高,同时避免了密集视频动态的混淆。ImageTime通过渐进能力层次组织任务,并将每个场景分解为阶段状态谓词、跨帧时间约束和禁止的因果违规。GPT-5.5在结构化的VLM-as-judge协议下对所有生成的图像进行评分,产生可解释的能力分数、诊断子分数和失败标签。通过多家族基准测试,ImageTime揭示了当前图像生成系统在要求随时间维持连贯视觉世界状态时成功、失败和漂移的地方。

英文摘要

Image generation models now produce high-quality static images, yet their ability to represent how a visual world changes over time remains poorly understood. Practical workflows such as storyboarding, step-by-step illustration, reference-guided editing, and video previsualization require models to preserve identities, objects, spatial relations, and causal order across multiple visual states. Existing evaluations largely measure single-image correctness, compositional alignment, or video quality, leaving open whether an image model can coherently imagine a temporally ordered process. We introduce ImageTime, a diagnostic benchmark that uses spatiotemporal consistency as a behavioral probe of visual world modeling in image generation. Given an action instruction, and optionally a reference image specifying the initial state, a model must generate one image containing four ordered key states: initial state, action onset, transition state, and final state. This four-keyframe protocol is more temporally demanding than single-image generation while avoiding the confounds of dense video dynamics. ImageTime organizes tasks with a progressive capability hierarchy and decomposes each scenario into stage-wise state predicates, cross-frame temporal constraints, and forbidden causal violations. GPT-5.5 scores all generated images under a structured VLM-as-judge protocol, producing interpretable capability scores, diagnostic subscores, and failure labels. Through multi-family benchmarking, ImageTime reveals where current image generation systems succeed, fail, and drift when asked to maintain coherent visual world states over time.

2606.10617 2026-06-10 cs.CV 新提交

SSR-Merge: Subspace Signal Routing for Training-Free LoRA Merging in Diffusion Models

SSR-Merge: 面向扩散模型中免训练的LoRA合并的子空间信号路由

Zhengxuan Wei, Yi Dong, Zonghui Li, Xianhui Lin, Xing Liu, Hong Gu, Shaofeng Zhang, Wenbin Li, Qi Fan

AI总结 提出子空间信号路由(SSR)方法,通过沿秩维度拼接LoRA构建统一子空间,利用逆相关矩阵去相关和方向引导矩阵分离信号,解决参数合并中的干扰问题,理论证明其等价于OLS最优解,并设计流式算法降低开销。

详情
Comments
Accepted at ICML 2026
AI中文摘要

低秩适应(LoRA)合并可以有效地将来自多个训练好的LoRA的不同生成能力组合到扩散模型中。然而,现有的LoRA合并技术常常遭受严重的参数干扰,导致共享参数空间中的破坏性冲突。为了解决这个问题,我们提出了子空间信号路由(SSR),它通过路由内部信号而不是执行参数空间合并来解决干扰。具体来说,SSR首先通过沿秩维度拼接候选LoRA来构建一个统一的子空间。接下来,SSR使用逆相关矩阵对该空间内的混合信号进行去相关。最后,一个方向引导矩阵将这些净化后的信号引导到各自的任务特定子空间。我们提供了严格的理论分析,证明SSR与普通最小二乘(OLS)解一致,从而确保数学最优性。我们利用充分统计量的可加性设计了一个流式算法。这使得能够进行即时更新,显著减少内存开销和计算时间。大量实验验证了SSR在保持相当效率的同时显著优于最先进的方法。代码可在该https URL获取。

英文摘要

Low-Rank Adaptation (LoRA) merging can efficiently combine diverse generative capabilities from multiple trained LoRAs for a diffusion model. However, existing LoRA merging techniques often suffer from severe parameter interference, causing destructive collisions in the shared parameter space. To address this, we propose Subspace Signal Routing (SSR), which resolves interference by routing internal signals instead of performing parameter-space merge. Specifically, SSR first constructs a unified subspace by concatenating candidate LoRAs along the rank dimension. Next, SSR employs an inverse correlation matrix to decorrelate mixed signals within this space. Finally, a directional guide matrix steers these purified signals into their respective task-specific subspaces. We provide a rigorous theoretical analysis proving that SSR aligns with the Ordinary Least Squares (OLS) solution, thereby ensuring mathematical optimality. We utilize the additivity of sufficient statistics to design a streaming algorithm. This enables on-the-fly updates that significantly reduce memory overhead and computation time. Extensive experiments validate that SSR significantly outperforms state-of-the-art methods while maintaining comparable efficiency. Code is available at https://github.com/nagara214/SSR-Merge.

2606.10614 2026-06-10 cs.RO cs.CV cs.LG 新提交

Dexterous Point Policy: Learning Point-based Dexterous Hand Policies from Human Demonstrations

灵巧点策略:从人类演示中学习基于点的灵巧手策略

Beomjun Kim, Seong Hyeon Park, Seunghoon Sim, Seungjun Moon, Sanghyeok Lee, Jinwoo Shin

AI总结 提出Dexterous Point Policy框架,通过统一3D关键点表示从人类视频学习灵巧操作策略,无需机器人演示,在真实任务中达到75%成功率。

详情
AI中文摘要

基于人类演示视频预训练的机器人基础模型显示出潜力,但当策略部署到真实机器人时仍存在显著的具身差距。常见的补救措施是在机器人特定演示上微调这些模型。然而,机器人数据收集可能过于昂贵和耗时,这在灵巧操作中尤为突出,例如,即使是单个原子任务,遥操作多指手也可能需要数天。为了解决这个问题,我们引入了Dexterous Point Policy,一个直接从人类视频学习灵巧操作策略且无需机器人演示的框架。我们的核心见解是,统一的3D关键点表示在用于观察和动作时,可以桥接人类和机器人的具身。具体来说,我们从原始视频中提取任务相关物体和人类手的3D关键点,并训练一个自回归变换器来处理这些关键点。我们观察到,在关键点层面,特别是手腕和指尖,人类和机器人的行为紧密对齐,从而实现直接策略迁移。在一套包括拾取放置和工具使用的真实机器人任务中,Dexterous Point Policy达到了75.0%的成功率,而最先进的VLA基线仅达到1.0%。此外,我们的方法对未见过的场景具有很强的泛化能力,包括多物体环境和新型物体类别。

英文摘要

Robotic foundation models pre-trained on human demonstration videos have shown promise, but a significant embodiment gap remains when the resulting policies are deployed on real robots. A common remedy is to fine-tune these models on robot-specific demonstrations. However, robot data collection can be prohibitively expensive and time-consuming, which is particularly acute in dexterous manipulation, e.g., teleoperating a multi-fingered hand for even a single atomic task can take days. To address this, we introduce Dexterous Point Policy, a framework that learns dexterous manipulation policies directly from human videos and requires no robot demonstrations. Our core insight is that a unified 3D keypoint representation can bridge human and robot embodiments when used for both observations and actions. Specifically, we extract 3D keypoints of task-relevant objects and human hands from raw videos, and train an autoregressive transformer over these keypoints. We observe that at the keypoint level, specifically the wrist and fingertips, human and robot behaviors closely align, enabling direct policy transfer. On a suite of real-robot tasks spanning pick-and-place and tool use, Dexterous Point Policy attains 75.0% success, whereas a state-of-the-art VLA baseline reaches only 1.0%. Furthermore, our method generalizes strongly to unseen scenarios, including multi-object environments and novel object categories.

2606.10613 2026-06-10 cs.LG cs.AI 新提交

Fast and Highly Expressive Policy Learning for Offline Reinforcement Learning via Bootstrapped Flow Q-Learning

基于自举流Q学习的离线强化学习快速且高表达性策略学习

Thanh Nguyen, Tri Ton, Hongbin Choe, Tung M. Luu, Chang D. Yoo

AI总结 提出自举流Q学习(BFQ),通过分治位移向量并自举短程分量,实现单步动作生成,无需辅助网络或蒸馏,显著降低计算成本并提升性能。

详情
Journal ref
ICML 2026
Comments
ICML 2026, 19 pages
AI中文摘要

基于扩散的Q学习已成为离线强化学习的一种强大范式,但其对多步去噪的依赖使得训练和推理在计算上昂贵且脆弱。最近将扩散Q学习加速到单步动作生成的努力通常引入辅助网络、策略蒸馏或多阶段训练,这常常损害简单性、稳定性或性能。为解决这些限制,我们引入了自举流Q学习(BFQ),一种新颖的框架,能够在训练和推理期间实现精确的单步动作生成,无需辅助网络或蒸馏过程。BFQ采用分治视角处理沿流路径的位移向量:它首先学习可以从流匹配边际速度准确估计的短程位移,然后自举这些分量以直接学习单步噪声到动作的映射。这种公式消除了多步去噪,导致学习过程更快、更简单、更稳健。广泛的D4RL评估表明,与多步扩散基线相比,BFQ在显著降低计算成本的同时提高了性能,证明了单步动作生成足以实现高性能的离线强化学习。

英文摘要

Diffusion-based Q-learning has emerged as a powerful paradigm for offline reinforcement learning, but its reliance on multi-step denoising makes both training and inference computationally expensive and brittle. Recent efforts to accelerate diffusion Q-learning toward single-step action generation typically introduce auxiliary networks, policy distillation, or multi-phase training, which frequently compromise simplicity, stability, or performance. To address these limitations, we introduce Bootstrapped Flow Q-Learning (BFQ), a novel framework that enables accurate single-step action generation during both training and inference, without auxiliary networks or distillation procedures. BFQ adopts a divide-and-conquer view of the displacement vector along the flow path: it begins by learning short-range displacements that can be accurately estimated from the Flow Matching marginal velocity, and bootstraps these components to directly learn a noise-to-action mapping in a single step. This formulation eliminates multi-step denoising, resulting in a learning procedure that is substantially faster, simpler, and more robust. Extensive D4RL evaluations show that BFQ improves performance while significantly reducing computational cost compared to multi-step diffusion baselines, demonstrating that single-step action generation suffices for high-performance offline Reinforcement Learning.

2606.10612 2026-06-10 cs.CV 新提交

GaussTrace: Provenance Analysis of 3D Gaussian Splatting Models with Evidence-based LLM Reasoning

GaussTrace:基于证据的LLM推理的3D高斯泼溅模型溯源分析

Haoliang Han, Ziyuan Luo, Renjie Wan

AI总结 提出GaussTrace框架,通过属性统计分析和假设驱动的编辑模拟,结合大语言模型链式推理,构建3D高斯泼溅模型的有向溯源图,无需训练或编辑历史。

详情
Comments
Accepted by ICML2026
AI中文摘要

3D高斯泼溅(3DGS)是一种创建高保真3D资产的有力技术。然而,3DGS模型在数字平台上的广泛共享和迭代修改给知识产权保护和取证溯源带来了紧迫挑战。为此,我们提出GaussTrace,一种用于构建3DGS模型有向溯源图的新框架。GaussTrace将溯源分析表述为基于证据的推理问题。它基于3DGS参数的属性统计特征来捕捉内在属性。此外,我们引入常见操作的假设驱动编辑模拟,为可能的变换路径提供辅助证据。这些统计和模拟线索共同使大语言模型(LLM)能够执行结构化思维链(CoT)推理,产生方向性溯源推断和可解释的边原因。实验结果表明,GaussTrace有效构建了不同3DGS模型之间的演化关系,无需模型训练或访问编辑历史,即可提供准确、可解释且鲁棒的溯源图。项目页面:此https URL。

英文摘要

3D Gaussian Splatting (3DGS) is a powerful technique for creating high-fidelity 3D assets. However, the widespread sharing and iterative modification of 3DGS models across digital platforms create pressing challenges for intellectual property protection and forensic traceability. To address this, we propose GaussTrace, a novel framework for constructing directed provenance graphs for 3DGS models. GaussTrace formulates provenance analysis as an evidence-based reasoning problem. It builds upon attribute-wise statistical profiling of 3DGS parameters to capture intrinsic properties. Moreover, we introduce hypothesis-driven editing simulations of common operations to provide auxiliary evidence for plausible transformation pathways. These statistical and simulated cues jointly enable a Large Language Model (LLM) to perform structured Chain-of-Thought (CoT) reasoning, yielding directional provenance inferences and explainable edge reasons. Experimental results demonstrate that GaussTrace effectively constructs evolutionary relationships among diverse 3DGS models, delivering accurate, interpretable, and robust provenance graphs without requiring model training or access to editing histories. Project page: https://haolianghan.github.io/GaussTrace.

2606.10600 2026-06-10 eess.SY cs.LG cs.SY 新提交

Toward Proactive RF Charging Scheduling: Generative AI for Decision Support

面向主动射频充电调度:用于决策支持的生成式人工智能

Amirhossein Azarbahram, Osmel M. Rosabal, David Ernesto Ruiz-Guirola, Melike Erol-Kantarci, Kaibin Huang, Onel L. A. López

AI总结 本文提出将生成式AI作为不确定性感知支持层,辅助射频无线充电调度器在有限资源和不确定条件下做出鲁棒充电决策,并通过仓库案例验证其有效性。

详情
AI中文摘要

射频无线能量传输(RF-WPT)是一种支持未来物联网系统不间断通信的使能技术,通过减少电池更换需求和缓解电池废弃物相关问题。在大规模RF-WPT部署中,主要挑战之一是调度器级别的资源分配。具体而言,发射器必须在有限的充电资源、不完整的接收端信息以及不确定的近未来充电条件下,决定输送多少能量、何时输送以及向谁输送。本文将生成式人工智能(GenAI)定位为一种有前景的工具,因为它能够基于粗略的操作上下文和接收端信息,预见多种可能的充电场景。我们提出GenAI作为RF-WPT调度器的不确定性感知支持层,而非独立的预测或决策工具。为此,我们首先重新审视RF-WPT调度面临的主要挑战,并讨论主要GenAI系列如何通过为下游任务生成基于场景的输入来支持不确定性感知的充电决策。然后,我们通过一个仓库式案例研究表明,与确定性预测和简单的无学习基线相比,通过生成模型的采样能力保留不确定性可以改善鲁棒充电决策,尤其是在风险敏感目标下。最后,我们指出了关键开放挑战并提出了未来研究方向。

英文摘要

Radio frequency wireless power transfer (RF-WPT) is an enabling technology for supporting uninterrupted communications in future Internet of Things systems by reducing the need for battery replacement and mitigating battery-waste-related issues. For large-scale RF-WPT deployment, one of the main challenges is the scheduler-level resource allocation. Specifically, the transmitter must decide how much energy to deliver, when, and to whom, under limited charging resources, incomplete receiver-side information, and uncertain near-future charging conditions. This article positions generative artificial intelligence (GenAI) as a promising tool for this setting because it can foresee multiple plausible charging scenarios conditioned on coarse operational context and receiver-side information. We propose GenAI to act as an uncertainty-aware support layer for the RF-WPT scheduler rather than as a standalone forecasting or decision-making tool. To this end, we first revisit the main challenges of RF-WPT scheduling, and discuss how major GenAI families can support uncertainty-aware charging decisions by generating scenario-based inputs for downstream tasks. We then present a warehouse-style case study showing that preserving uncertainty through the sampling capability of generative models can improve robust charging decisions compared with deterministic prediction and simple non-learning baselines, especially under risk-sensitive objectives. Finally, we identify key open challenges and present some directions for future research.

2606.10596 2026-06-10 cs.LG cs.AI cs.SY eess.SY 新提交

Embedding Hybrid Systems into Continuous Latent Vector Fields

将混合系统嵌入连续潜在向量场

Sangli Teng, Hang Liu, Koushil Sreenath

AI总结 证明当m>2n时,n维混合系统可嵌入m维欧氏空间中的连续向量场,并基于此提出一种潜在神经ODE方法,从时间序列数据中准确恢复混合系统流,优于现有方法。

详情
Comments
Accepted to ICML 2026
AI中文摘要

这项工作证明了当$m>2n$时,一个$n$维混合系统可以嵌入到一个$m$维欧氏空间中,并在其嵌入图像上配备一个连续向量场。这一结果表明,一个本质上不连续的混合系统通常允许一个连续的 extrinsic 表示,该表示对于可微优化是适定的。基于这一存在性定理,我们表明,在潜在空间和状态空间中都具有一致性损失的潜在神经ODE可以准确恢复混合系统的流。大量实验表明,所提出的方法在仅从时间序列数据学习具有不同几何形状的混合系统方面优于现有方法。

英文摘要

This work proves that an $n$-dimensional hybrid system can be embedded into an $m$-dimensional Euclidean space equipped with a continuous vector field on its embedded image whenever $m>2n$. This result suggests that an intrinsically discontinuous hybrid system generically admits a continuous extrinsic representation that is well-posed for differentiable optimization. Building on this existence theorem, we show that a latent Neural ODE with consistency loss in both the latent and state space can accurately recover the flow of hybrid systems. Extensive experiments suggest the proposed method outperforms the existing method in learning hybrid systems with varying geometries from only time series data.

2606.10595 2026-06-10 cs.CR cs.AI 新提交

From Data Heterogeneity to Convergence: A Data-Centric Review of Federated Learning

从数据异质性到收敛:联邦学习的数据中心综述

Huong Nguyen, Mickaël Bettinelli, Amirhossein Ghaffari, Alexandre Benoit, Hong-Tri Nguyen, Susanna Pirttikangas, Lauri Lovén

AI总结 本文从数据视角系统分析联邦学习中数据异质性对收敛的影响,提出可测量特征分类、连接实验分割与真实现象、评估数据相关脆弱性与防御对收敛的影响,为设计可预测收敛的系统提供指导。

详情
AI中文摘要

联邦学习(FL)已成为集中式学习中数据饥饿问题的有前途解决方案。这种范式使得多个客户端能够在隐私保护下协作训练共享任务模型,而无需暴露其本地数据。虽然数据是任何学习系统中的关键组成部分,但它也是漏洞和挑战的主要来源,并且是稳定且良好收敛训练的主要决定因素。现有的FL综述描述了通用基础、安全实践、机遇、挑战和应用,但没有深入探讨数据的各个方面以及从数据角度考虑问题。它们很少提供一种数据视角的综合,将具体的数据属性、分割协议和防御与收敛速度和稳定性联系起来。本综述通过三个进展填补了这一空白。首先,我们将非独立同分布(non-IID)分析为可测量的特征,并根据其对收敛的影响将其排序为强、中、轻,解释每种影响背后的机制,并调和图像、文本和图上的证据。其次,我们将实验分割实践与它们模拟的真实现象联系起来,揭示它们引入的伪影,并展示这些伪影如何影响目标精度。第三,我们分析了数据相关的脆弱性及其提出的防御如何影响收敛,报告在干净和对抗条件下的性能,使收敛-鲁棒性权衡明确。据我们所知,这是第一个提供对支配FL的数据相关挑战的完整理解的综述。针对每个问题提炼出清晰的要点,我们的工作可作为可操作的指南,帮助从业者设计具有可预测收敛和稳定性的系统。

英文摘要

Federated Learning (FL) has emerged as a promising solution for data hunger in centralized learning. This paradigm enables privacy with multiple clients to train a shared-task model collaboratively without exposing their local data. While being a key component in any learning system, data is also a primary source of vulnerabilities and challenges, and a major determinant of a stable and well-converged training. Existing FL reviews describe general foundations, security practices, opportunities, challenges, and applications, without delving into diverse aspects of data and considering problems from the data perspective. They rarely provide a data-lens synthesis that links concrete data properties, split protocols, and defenses to convergence speed and stability. This survey fills that gap with three advances. First, we analyze non-IID into measurable traits and rank their influence on convergence as strong, medium, or light, explaining the mechanisms behind each and reconciling evidence across images, texts, and graphs. Second, we connect experimental splitting practices to the real phenomena they emulate, expose the artifacts they introduce, and show how those artifacts affect target accuracy. Third, we analyze how data-related vulnerabilities and their proposed defenses affect convergence, reporting performance under clean and adversarial conditions to make the convergence-robustness trade-off explicit. To our knowledge, this is the first survey to provide a complete understanding of data-related challenges that govern FL. With clear takeaways distilled for each concern, our work serves as actionable guidance, helping practitioners design their system with predictable convergence and stability.

2606.10587 2026-06-10 cs.LG cs.AI 新提交

Towards Diverse Scientific Hypothesis Search with Large Language Models

面向多样化科学假设搜索的大语言模型

Haorui Wang, Parshin Shojaee, Kazem Meidani, Kunyang Sun, José Miguel Hernández-Lobato, Teresa Head-Gordon, Jiajun He, Chandan K. Reddy, Chao Zhang, Yuanqi Du

AI总结 针对科学假设搜索中多样性崩溃问题,提出基于并行回火的多温度进化框架,在固定验证预算下提升假设质量与多样性。

详情
Comments
ICML 2026
AI中文摘要

大语言模型(LLMs)在加速科学发现方面日益崛起,最近在生成有效科学假设等高级任务中表现突出。然而,在许多发现场景中,目标并非识别单一最佳假设,因为验证可能噪声大且成本高,科学家受益于一组高质量替代假设,以对冲下游不确定性,寻求最佳解决方案。尽管如此,常用的进化搜索策略在假设生成中往往优先优化而非探索,搜索过程中的选择压力导致多样性崩溃。受这些局限性的启发,我们将假设搜索表述为采样问题,目标是在固定验证预算下高效生成多样化、高质量的假设。基于这一视角,我们提出\ours,一种受经典并行回火算法启发的进化框架,在多个温度水平下搜索假设,并实现跨温度的原则性信息交换,以在不干扰收敛的情况下改善探索。在分子发现、方程发现和算法发现等领域,我们的方法在相同验证预算下持续提升假设质量和多样性,生成的候选假设在更昂贵的下游计算验证中仍保持稳健。

英文摘要

Large language models (LLMs) are on the rise for accelerating scientific discovery, most recently in advanced tasks such as generating valid scientific hypotheses. Yet in many discovery settings, the goal is not to identify a single best hypothesis since validation can be noisy and expensive, and scientists benefit from a set of high-quality alternative hypotheses that hedge against downstream uncertainty for the best solutions. Nevertheless, commonly used evolutionary search recipes tend to prioritize optimization over exploration in hypothesis generation, and the resulting selection pressure during the search process leads to diversity collapse. Motivated by these limitations, we formulate hypothesis search as a sampling problem, where the objective is to efficiently produce diverse, high-quality hypotheses under a fixed validation budget. Building on this perspective, we propose \ours, an evolutionary framework inspired by the classical parallel tempering algorithm that searches hypotheses at multiple temperature levels and enables principled information exchange across temperatures to improve exploration without disrupting convergence. Across domains including molecular discovery, equation discovery, and algorithm discovery, our approach consistently improves both hypothesis quality and diversity under the same validation budget, and produces candidates that remain robust under more expensive downstream computational validations.

2606.10565 2026-06-10 cs.SD eess.AS 新提交

A Lightweight Dual-Factor Acoustic Authentication System via Cascaded GMM-DTW Architecture for Edge Computing

一种基于级联GMM-DTW架构的轻量级双因素声学认证系统用于边缘计算

Yutong Zhang

AI总结 针对资源受限的边缘环境,提出一种轻量级级联GMM-DTW双因素语音锁系统,通过共享MFCC特征空间实现顺序防御,结合动态联合绝对-相对边界约束,在低功耗边缘节点上实现低延迟和高安全性。

详情
AI中文摘要

本文提出了一种轻量级、级联GMM-DTW双因素语音锁系统,适用于资源受限的边缘环境。通过利用共享的MFCC特征空间,该框架实现了结合GMM说话人筛选和DTW口令验证的顺序防御机制。为了在不增加额外硬件的情况下应对呈现攻击,在GMM分类空间中引入了动态联合绝对-相对边界约束,将物理冒名顶替者和高保真重放攻击的误接受率(FAR)分别限制在2.73%和6.67%,合法用户的误拒绝率(FRR)为16.67%。由于Sakoe-Chiba窗口优化,在时间压力下,全局端到端处理延迟在单核CPU上严格限制为9.82ms,其中特征提取1.51ms,GMM评分0.54ms,最坏情况DTW匹配7.77ms。这些经验基准证明了白盒声学级联在低功耗边缘节点上实现安全、确定性实时部署的可行性。

英文摘要

This paper presents a lightweight, cascaded GMM-DTW dual-factor voice lock system for resource-constrained edge environments. By utilizing a shared MFCC feature space, the framework implements a sequential defense mechanism combining GMM speaker screening and DTW passphrase verification. To counter presentation threats without extra hardware, a dynamic joint absolute-relative margin constraint is integrated into the GMM classification space, limiting the physical imposter and high-fidelity replay attack False Acceptance Rates (FAR) to 2.73% and 6.67%, respectively, with a legitimate False Rejection Rate (FRR) of 16.67%. Due to Sakoe-Chiba window optimization, the global end-to-end processing latency under temporal stress is rigidly bounded at 9.82ms on a single-core CPU, comprising 1.51ms for feature extraction, 0.54ms for GMM scoring, and 7.77ms for worst-case DTW matching. These empirical benchmarks demonstrate the viability of white-box acoustic cascades for secure, deterministic real-time deployment on low-power edge nodes.

2606.10531 2026-06-10 cs.CL cs.AI 新提交

LC-QAT: Data-Efficient 2-Bit QAT for LLMs via Linear-Constrained Vector Quantization

LC-QAT: 通过线性约束向量量化实现LLM的数据高效2比特QAT

Haoyu Wang, Xingyu Yu, Haiyan Zhao, Fengxiang Wang, Xu Han

AI总结 提出LC-QAT,一种2比特权重量化的向量量化感知训练框架,通过可微的线性映射避免离散码本查找,实现高质量PTQ初始化和端到端优化,仅用0.1%-10%训练数据即超越现有方法。

详情
Comments
Accepted by ICML 2026
AI中文摘要

量化感知训练(QAT)对于极低比特大语言模型(LLMs)至关重要。当前的QAT方法主要基于标量量化(SQ),虽然能高效优化,但在2比特精度下性能严重下降。另一方面,向量量化(VQ)提供了更高的表示能力,但其离散码本查找阻碍了端到端训练。我们提出LC-QAT,一种2比特权重量化的VQ-QAT框架,通过离散向量上的学习仿射映射表示量化权重,从而在训练前向传播中无需显式码本查找即可实现高质量PTQ初始化和完全可微的端到端优化。这种强大的训练后初始化使LC-QAT具有高度数据效率。在多种LLM上的实验表明,LC-QAT在使用仅0.1%-10%训练数据的情况下,始终优于最先进的QAT方法。我们的结果确立了LC-QAT作为极低比特模型部署的实用且可扩展的解决方案。

英文摘要

Quantization-aware training (QAT) is essential for extremely low-bit large language models (LLMs). Current QAT methods are mainly based on scalar quantization (SQ), which enables efficient optimization but suffers from severe performance degradation at 2-bit precision. On the other hand, vector quantization (VQ) provides substantially higher representational capacity, but its discrete codebook lookup prevents end-to-end training. We propose LC-QAT, a 2-bit weight-only VQ-QAT framework that represents quantized weights via a learned affine mapping over discrete vectors, which yields a high-quality PTQ initialization and enables fully differentiable end-to-end optimization without explicit codebook lookup in the training forward pass. This strong post-training initialization makes LC-QAT highly data-efficient. Experiments across diverse LLMs demonstrate that LC-QAT consistently outperforms state-of-the-art QAT methods while using only 0.1%--10% of the training data. Our results establish LC-QAT as a practical and scalable solution for extreme low-bit model deployment.

2606.10525 2026-06-10 cs.CR cs.AI 新提交

Assessing Automated Prompt Injection Attacks in Agentic Environments

评估智能体环境中的自动化提示注入攻击

David Hofer, Edoardo Debenedetti, Florian Tramèr

AI总结 研究在智能体环境中,黑盒优化方法(TAP)比梯度方法(GCG)更有效,且攻击效果依赖于攻击者模型,任务通用攻击可迁移但跨模型迁移受限。

详情
AI中文摘要

间接提示注入对与不可信外部数据交互的LLM智能体构成严重威胁,然而在现实智能体环境中,自动化攻击方法(已被证明对越狱有效)仍未得到充分探索。我们针对LLM智能体进行了自动化提示注入攻击的全面实证评估,将白盒(GCG)和黑盒(TAP)方法都适配到AgentDojo框架中的智能体设置。我们在跨越四个领域和多个模型的80个任务对上进行评估,发现黑盒优化显著优于基于梯度的方法,我们将这一差距归因于GCG在合理计算预算下的优化不稳定性。我们还发现TAP的有效性取决于攻击者模型,因为通用能力和安全调优都会影响攻击成功率——更强的模型产生更有效的注入,而安全调优的攻击者可能拒绝生成对抗性提示。任务通用攻击有效迁移到未见过的任务和分布外领域,但在较小开源模型上优化的攻击不会迁移到GPT-5等前沿模型。这些发现表明自动化提示注入是一种可信但依赖于模型的威胁,对于模型无关的利用仍存在重大障碍。

英文摘要

Indirect prompt injection poses a critical threat to LLM agents that interact with untrusted external data, yet automated attack methods--proven effective for jailbreaking--remain underexplored in realistic agentic settings. We present a comprehensive empirical evaluation of automated prompt injection attacks against LLM agents, adapting both white-box (GCG) and black-box (TAP) methods to the agentic setting within the AgentDojo framework. We evaluate across 80 task pairs spanning four domains and multiple models, and find that black-box optimization substantially outperforms gradient-based methods, a gap we attribute to GCG's optimization instability under reasonable compute budgets. We also find that TAP's effectiveness depends on the attacker model, as both general capability and safety tuning affect attack success--stronger models produce more effective injections, while safety-tuned attackers can refuse to generate adversarial prompts. Task-universal attacks transfer effectively to unseen tasks and out-of-distribution domains, but attacks optimized on smaller open-source models do not transfer to frontier models like GPT-5. These findings highlight automated prompt injection as a credible but model-dependent threat, with significant barriers remaining for model-agnostic exploitation.

2606.10520 2026-06-10 cs.CL 新提交

UniSVQ: 2-bit Unified Scalar-Vector Quantization

UniSVQ: 2比特统一标量-向量量化

Haoyu Wang, Haiyan Zhao, Xingyu Yu, Zhangyang Yao, Xu Han, Zhiyuan Liu, Maosong Sun

AI总结 提出UniSVQ,通过将码字参数化为整数格点的仿射变换,统一标量和向量量化,实现2比特量化下性能优于标量量化、媲美向量量化,且推理吞吐更高。

详情
Comments
Accepted by ICML 2026
AI中文摘要

2比特级别的训练后量化使得大型语言模型(LLMs)能够实现低成本部署和推理加速。标量量化(SQ)和向量量化(VQ)是两种主要的量化方法,然而前者遭受显著的性能下降,后者则带来计算和存储开销。我们提出UniSVQ,一个统一的2比特量化框架,通过将码字参数化为整数格点的仿射变换,桥接了标量和向量量化。这种结构保持了与优化整数内核的兼容性,同时保留了VQ的许多灵活性。我们进一步引入了一种数据驱动的块级微调策略,以直接最小化量化重建误差。在多个LLM家族和零样本基准上的大量实验表明,UniSVQ持续优于最先进的SQ方法,并实现了与高级VQ方法相当的性能,同时提供更高的推理吞吐量。

英文摘要

Post-training quantization at the 2-bit level enables low-cost deployment and inference acceleration for large language models (LLMs). Scalar quantization (SQ) and vector quantization (VQ) are two primary quantization methods, however, the former suffers from significant performance degradation, and the latter incurs computational and storage overhead. We propose UniSVQ, a unified 2-bit quantization framework that bridges scalar and vector quantization by parameterizing codewords as an affine transform of integer lattices. This structure preserves compatibility with optimized integer kernels while retaining much of VQ's flexibility. We further introduce a data-driven block-wise fine-tuning strategy to directly minimize quantization reconstruction error. Extensive experiments across multiple LLM families and zero-shot benchmarks demonstrate that UniSVQ consistently outperforms state-of-the-art SQ methods and achieves performance comparable to advanced VQ methods, while providing higher inference throughput.

2606.10504 2026-06-10 cs.AI 新提交

Cross-Modal Knowledge Distillation without Paired Data: Theoretical Foundation and Algorithm

无配对数据的跨模态知识蒸馏:理论基础与算法

Trong Khiem Tran, Anh Duc Chu, Quang Hung Pham, Phi Le Nguyen, Trong Nghia Hoang

AI总结 提出无配对数据下的跨模态知识蒸馏框架,通过特征对齐和标签对齐两种分布对齐机制,实现跨模态知识迁移,理论保证且实验效果显著。

详情
AI中文摘要

跨模态知识蒸馏(CMKD)研究如何利用在一种数据类型(如图像)上训练的大型教师模型来指导基于另一种数据类型(如文本/音频)的较小学生模型。现有的CMKD方法通常需要具有对齐语义的配对多模态数据,但获取此类配对数据往往成本高昂且不切实际。为缓解这一限制,我们针对更困难的设置——无配对数据——开发了一种新的CMKD框架。特别地,我们建立了教师模型与学生模型之间的跨模态分布关系,揭示了控制有效蒸馏的两个基本量:特征对齐和标签对齐。这些量分别从表示和预测分布层面表征了模态间的语义差异。受此启发,我们提出了一个具有理论保证的原则性框架,通过对齐分布而非单个样本实现有效的跨模态知识蒸馏。在广泛的多模态基准上的大量实验表明,我们的框架在无配对和有配对数据设置中均非常有效,显著优于先前的工作。

英文摘要

Cross-modal knowledge distillation (CMKD) studies how a (large) teacher model trained on one type of data (e.g., images) can guide a (smaller) student model building on another type of data (e.g., text/audio). Existing CMKD methods often require paired multi-modal data with aligned semantics, but obtaining such paired data are often costly and impractical. To mitigate this limitation, we develop a new CMKD framework for the more challenging setting where paired data are unavailable. In particular, we establish a cross-modal distributional relationship between teacher and student models, which reveals two fundamental quantities governing effective distillation: feature alignment and label alignment. These quantities characterize semantic discrepancy between modalities at the levels of representation and prediction distributions, respectively. Motivated by this insight, we propose a principled framework, with theoretical guarantees, that enables effective cross-modal knowledge distillation by aligning distributions rather than individual samples. Extensive experiments across a wide range of multimodal benchmarks show that our framework is highly effective in both unpaired and paired data settings, improving significantly over prior work.

2606.10500 2026-06-10 cs.AI 新提交

A Reliable Fault Diagnosis Method Based on Belief Rule Base Consider Robustness Analysis

一种考虑鲁棒性分析的基于置信规则库的可靠故障诊断方法

Mingyuan Liu, Dan Yin, Zongzong Wu

AI总结 针对故障诊断中传感器读数可靠性问题,提出一种基于置信规则库的可靠故障诊断方法,通过鲁棒性分析与优化策略提升模型准确性和鲁棒性,在柴油机和轴承故障诊断中验证有效性。

详情
AI中文摘要

在设备运行中,实施故障诊断对于确保生产设备的连续性和安全性、提高运行效率以及降低维护成本至关重要。由于传感器读数广泛用于故障诊断,其可靠性直接影响故障诊断的结果。针对故障诊断模型的鲁棒性评估和鲁棒性优化两个问题,提出了一种新的故障诊断方法。为此,提出了一种考虑鲁棒性分析的基于置信规则库(BRB)的可靠故障诊断方法。首先,系统地对BRB模型进行鲁棒性分析。其次,提出了三种鲁棒性约束策略来优化BRB故障诊断模型的鲁棒性。最后,以WD615柴油机和凯斯西储大学轴承的故障诊断为例,验证了所提模型的有效性,实验表明所提模型在准确性和鲁棒性上均有提升。

英文摘要

In equipment operation, the implementation of fault diagnosis is essential to ensure the continuity and safety of production equipment, improve operational efficiency and reduce maintenance costs. Since sensor readings are widely used for fault diagnosis, their reliability directly affects the results of fault diagnosis. A new fault diagnosis method is proposed to address the two problems of robustness assessment and robustness optimization of fault diagnosis models. For this purpose, a reliable fault diagnosis method based on a belief rule base (BRB) considering robustness analysis is proposed. Firstly, the robustness analysis of the BRB model is carried out systematically. Secondly, three robustness constraint strategies are proposed to optimize the robustness of the BRB fault diagnosis model. Finally, the effectiveness of the proposed model is verified by taking the fault diagnosis of WD615 diesel engine and Case Western Reserve University bearings as an example, and the experiments show that the proposed model improves both accuracy and robustness.

2606.10493 2026-06-10 cs.DC cs.AI cs.LG cs.NE 新提交

Achieving Cloud-Grade SLOs for Local Mixture-of-Experts Inference through CPU-GPU Hybrid Design

实现本地混合专家模型推理的云级SLO:CPU-GPU混合设计

Wenxin Wang, Yule Hou, Yu Ji, Peng Qu, Youhui Zhang

AI总结 针对本地MoE推理在低并发下仍无法达到云级服务质量的问题,提出CPU-GPU混合系统,通过流加载预填充、分布式SLP、节点内预填充-解码分离、AVX-512优化FP8 GEMV内核和细粒度CPU并行,在消费级硬件上实现云级SLO。

详情
Comments
Accepted to the 20th USENIX Symposium on Operating Systems Design and Implementation (OSDI '26). The official version will appear in the OSDI '26 proceedings published by USENIX
AI中文摘要

本地部署大型混合专家(MoE)模型即使在低并发工作负载下也无法达到云级环境中的服务质量。我们识别出本地MoE推理中的四个关键差距:依赖容量缩减模型(量化、蒸馏、重路由)、无法满足长预填充(超过12K)的30秒TTFT、低于基线的解码吞吐量(低于20 tokens/s)、以及在混合预填充-解码和批量解码工作负载下的并发性差。我们提出一个CPU-GPU混合系统,通过以下方式在双插槽商用CPU和消费级GPU上实现云级SLO:(1)流加载预填充(SLP),将预填充吞吐量提升至1,200 tokens/s,并在30秒内支持32K提示;(2)采用SmallEP专家并行的分布式SLP(DSLP),在两张RTX 5090上达到1,800 tokens/s和45K提示;(3)节点内预填充-解码分离,具有零拷贝共享权重和双批次注意力-MoE重叠方案,在延迟增加低于15%且吞吐量提升50%的情况下维持并发性;(4)AVX-512优化的FP8 GEMV内核,实现原生CPU FP8推理,同时降低4-5倍CPU延迟;(5)细粒度CPU并行,在INT4 DeepSeek-V3上达到28 tokens/s,在完整FP8 V3上达到21.5 tokens/s。评估表明,我们的系统在消费级CPU-GPU平台上为旗舰MoE模型提供云级QoS,通过完整原始精度推理重塑本地部署,无需数据中心基础设施即可实现高质量、经济高效的访问。

英文摘要

Local deployment of large Mixture-of-Experts (MoE) models falls short of the service quality achieved in cloud-scale environments, even under low-concurrency workloads. We identify four key gaps in local MoE inference: reliance on capacity-reduced models (quantized, distilled, rerouted), inability to meet 30-second TTFT for long prefills (more than 12K), sub-baseline decode throughput (under 20 tokens/s), and poor concurrency under mixed prefill-decode and batched decode workloads. We present a CPU-GPU hybrid system that achieves cloud-level SLOs on dual-socket commodity CPUs and consumer GPUs by (1) stream-loading prefill (SLP), boosting prefill throughput to 1,200 tokens/s and enabling 32K prompts within 30 seconds; (2) distributed SLP (DSLP) with SmallEP expert parallelism, reaching 1,800 tokens/s and 45K prompts in 30 seconds on two RTX 5090s; (3) intra-node prefill-decode disaggregation with zero-copy shared weights and a dual-batch attention-MoE overlap scheme, sustaining concurrency with under 15 percent latency increase and 50 percent throughput gains; (4) an AVX-512-optimized FP8 GEMV kernel, enabling native CPU FP8 inference while delivering 4-5x lower CPU latency; and (5) fine-grained CPU parallelism that attains 28 tokens/s on INT4 DeepSeek-V3 and 21.5 tokens/s on intact FP8 V3. Evaluations show our system delivers cloud-level QoS for flagship MoE models on consumer CPU-GPU platforms, reshaping local deployment with intact, original-precision inference and enabling high-quality, cost-effective access without datacenter infrastructure.

2606.10475 2026-06-10 cs.MA cs.AI cs.CL 新提交

Decoupling Thought from Speech: Knowledge-Grounded Counterfactual Reasoning for Resilient Multi-Agent Argumentation

思想与言语解耦:基于知识反事实推理的鲁棒多智能体辩论

Jakub Masłowski, Jarosław A. Chudziak

AI总结 提出知识反事实推理(KG-CFR)双阶段架构,通过私有规划缓冲与公共执行层分离,在动态资源分配环境下将扰动后论证质量从0.694提升至0.822,并减少语义循环。

详情
Comments
Accepted for publication in the Proceedings of the 30th International Conference on Knowledge-Based and Intelligent Information & Engineering Systems (KES 2026)
AI中文摘要

多智能体辩论框架已被证明能提升大语言模型在收敛任务上的表现,但目前优化方式过度偏向最终输出准确性而非过程稳定性。在长时间交互中,持续扰动下的反应式系统常出现逻辑退化、论点重复和角色漂移。为从结构上防止身份丢失并保持过程保真度,我们引入知识反事实推理(KG-CFR),一种双阶段架构,在私有检索增强规划缓冲区和公共执行层之间强制执行严格关注点分离。我们在不确定性下动态资源分配(DRAU)这一专用1v1v1环境中评估该系统,引入与标准辩论设置不同的多样性。在270次完全析因危机模拟轨迹(含随机环境冲击)中,KG-CFR在超过95%的扰动运行中防止了裁判检测到的关键冲击后退化(定义为质量偏移Δ ≤ -0.20),将整体论证质量从0.694提升至0.822。我们的主要贡献是证明架构解耦是在持续压力下不损失质量而增强系统鲁棒性的重要因素。此外,我们引入了用于话语发散和计划执行对齐的自定义向量度量,为操作稳定性提供了强有力且方向一致的证据。消融实验表明,适当的教义基础与前瞻规划对论证质量同等重要。根据初步度量评估,KG-CFR通过保持智能体与原始计划的一致性减少了语义循环。

英文摘要

Multi-agent debate frameworks have been shown to improve large language model performance in convergent tasks, but they are currently optimized in a way that heavily favors final output accuracy rather than stability of the process. During long-horizon exchanges reactive systems under sustained perturbations often experience logic degradation, argument repetition, and role drift. To structurally prevent the identity loss and maintain the process fidelity, we introduce Knowledge-Grounded Counterfactual Reasoning (KG-CFR), a dual-stage architecture that enforces a strict separation of concerns between a private, retrieval-augmented planning buffer, and a public execution layer. We assess this system in Dynamic Resource Allocation under Uncertainty (DRAU), a dedicated 1v1v1 environment, introducing diversity as distinct from standard debate settings. Over 270 completely factorial crisis simulation trajectories with stochastic environmental shocks, KG-CFR prevents judge-detected critical post-shock degradation (defined as a quality shift, $Δ\le -0.20$) in more than 95% of perturbed runs, increasing the overall argument quality from 0.694 to 0.822. Our primary contribution is the demonstration of architectural decoupling being an important factor of systemic resilience enhancement under sustained pressure without quality loss. Furthermore, we introduce custom vector metrics for discourse divergence and plan-execution alignment that provide strong, directionally consistent evidence of operational stability. Our ablation experiments suggest that the proper doctrinal grounding can be an equally important factor for argument quality, as the prospective planning. KG-CFR, according to our initial metric evaluations, reduces semantic looping, by preserving the agent's consistency with the original plan.

2606.10472 2026-06-10 cs.GT cs.LG 新提交

Trading Utility for Dynamic Fairness in Multiple Resource Division with Sequential Demand

在顺序需求的多资源分配中权衡效用与动态公平性

Kaiqi Jiang, Karim El Husseini, Wenzhe Fan, Xinhua Zhang

AI总结 提出一种神经分配机制,通过多目标优化在顺序分配中平衡公平与效用,实现更高的效用同时保持可比公平性。

详情
AI中文摘要

动态多资源分配是共享计算环境中的一个核心问题,其中用户的需求顺序到达,且必须在不知道未来需求的情况下公平分配资源。现有方法强调公平性保证,如共享激励、无嫉妒和动态帕累托最优性,但往往忽略系统效用。此外,这些公平性标准互不兼容,无法同时严格实施。我们提出一种神经分配机制,通过在顺序展开过程中进行多目标优化来调和公平性与效用。我们首先通过共享激励、无嫉妒和动态帕累托最优性的逐步损失函数形式化动态环境中的公平性,从而实现可微训练。利用非浪费性,我们通过将分配约束在需求子空间内来参数化解,同时允许在资源可用时进行弹性过度分配。实验结果表明,我们学习的分配器在可比公平性水平下实现了显著更高的效用,揭示了跨指标的清晰帕累托前沿式权衡。

英文摘要

Dynamic multi-resource allocation is a central problem in shared computing environments, where users' demands arrive sequentially and resources must be distributed fairly without knowledge of future demands. Existing methods emphasize fairness guarantees such as Sharing Incentive, Envy Freeness, and Dynamic Pareto Optimality, but often overlook system utility. Moreover, these fairness criteria are mutually incompatible, preventing strict enforcement of them at the same time. We propose a neural allocation mechanism that reconciles fairness with utility through multi-objective optimization during sequential rollout. We first formalize fairness in the dynamic setting via stepwise loss functions for Sharing Incentive, Envy Freeness, and Dynamic Pareto Optimality, enabling differentiable training. Leveraging non-wastefulness, we parameterized the solutions by constraining allocations to the subspace of demand while allowing elastic over-allocation when resources remain available. Empirical results demonstrate that our learned allocator achieves substantially higher utility at comparable levels of fairness, uncovering clear Pareto-frontier-like tradeoffs across metrics.

2606.10461 2026-06-10 cs.LG cs.AI cs.CL 新提交

ERAlign: Energy-based Representation Alignment of GNNs and LLMs on Text-attributed Graphs

ERAlign: 文本属性图上GNN与LLM的基于能量的表示对齐

Xianlin Zeng, Fan Xia, Xiangyu Chen

AI总结 提出ERAlign框架,利用能量模型对齐GNN和LLM的表示,通过能量差异优化实现分布一致性,在8个数据集上取得最优性能。

详情
Comments
Accepted to ICML 2026
AI中文摘要

文本属性图(TAGs)将文本节点属性与图结构相结合,以描述丰富的关联语义。最近整合图神经网络(GNNs)和大语言模型(LLMs)的努力在TAGs学习上显示出前景,但实现良好对齐的表示仍然具有挑战性。先前的研究主要依赖于执行粗粒度匹配的启发式方法。它们缺乏足够的约束,忽略了分布对齐,导致表示漂移和泛化能力有限。基于能量模型(EBMs),我们提出了一种基于能量的表示对齐(ERAlign)框架,该框架将GNN编码的图结构和LLM导出的文本嵌入投影到共享潜在空间,以实现分布一致性。具体来说,层间对齐通过距离度量量化,并通过EBM目标进行优化。通过降低能量值,我们的框架为下游任务产生良好对齐的表示。在训练过程中,我们引入能量差异(ED)以避免与难以处理的归一化相关的高采样成本。ED还具有更高的训练效率和减少能量景观失真的理论保证。在八个TAG数据集上的实证评估表明,ERAlign在不同监督水平和跨任务迁移场景下均获得了最先进的性能。

英文摘要

Text-attributed Graphs (TAGs) incorporate textual node attributes with graph structures to describe rich relational semantics. Recent efforts to integrate Graph Neural Networks (GNNs) and Large Language Models (LLMs) have shown promise for learning on TAGs, yet achieving well-aligned representations remains challenging. Prior studies largely rely on heuristics that perform coarse-grained matching. They lack sufficient constraints and ignore distributional alignment, leading to representation drift and limited generalization. Building on Energy-based Models (EBMs), we propose an Energy-based Representation Alignment (ERAlign) framework that projects GNN-encoded graph structure and LLM-derived text embeddings in a shared latent space to achieve distribution consistency. Concretely, layer-wise alignment is quantified by a distance metric and optimized via an EBM objective. By decreasing energy values, our framework yields well-aligned representations for downstream tasks. During training, we introduce Energy Discrepancy (ED) to avoid high sampling costs associated with intractable normalization. ED also carries theoretical guarantees of higher training efficiency and reduced energy landscape distortion. Empirical evaluations on eight TAG datasets demonstrate that ERAlign obtains state-of-the-art performance across varying levels of supervision and cross-task transfer scenarios.

2606.10459 2026-06-10 cs.SI cs.CL 新提交

Leveraging Social Media Data for COVID-19 Studies

利用社交媒体数据进行COVID-19研究

Nur Hafieza Ismail, Nur Shazwani Kamarudin, Nurol Husna Che Rose

AI总结 本文探讨社交媒体在COVID-19大流行期间的作用,分类使用数据,介绍机器学习、特征工程、自然语言处理和调查方法,并指出未来研究方向。

详情
Comments
8 pages, 1 figure
AI中文摘要

如今,社交网络已成为广泛偏好的信息来源。特别是在2019冠状病毒病(COVID-19)大流行期间,社交媒体已成为获取与COVID-19相关最新新闻和信息的最常用平台之一。社交媒体之所以受欢迎,是因为它们为注册用户提供免费访问,并允许他们发布、传播信息以及回复他人的帖子。全球有近46亿社交媒体用户,因此这些平台上共享的大量信息可能影响人们如何看待和应对当前面临的大流行,这并不令人惊讶。通过合理使用,社交媒体可以成为传播可靠新闻和提高患者、临床医生及社会公众意识的有益数字工具。具体而言,本章描述了用户披露中表达的语言、视觉和情感指标。因此,本章详细探讨和讨论了COVID-19大流行期间社交媒体平台使用的相关研究。本章还对所使用的社交媒体数据进行了分类,介绍了不同的部署机器学习、特征工程、自然语言处理和调查方法,并概述了未来研究的方向。

英文摘要

Nowadays, social media networks have become widely preferred sources of information. Especially during the time of the Coronavirus disease 2019 COVID 19 pandemic, social media has been one of the most used platforms to get the latest news and information related to COVID 19. Social media are popular because they offer free access to their registered users and allow them to do posting, disseminate information, and respond to others postings. With almost 4.6 billion social media users worldwide, it is not surprising the significant amount of information shared through these platforms could affect how people perceive and cope with the pandemic that we are facing right now. With decent use, social media can be a beneficial digital tool to spread reliable news and public awareness for patients, clinicians, and society. Specifically, this chapter describes linguistic, visual, and emotional indicators expressed in user disclosures. Thus, in this chapter, the related studies of social media platforms usage during the COVID 19 pandemic are explored and discussed in detail. This chapter also categorizes social media data used, introduces different deployed machine learning, feature engineering, natural language processing, and survey methods, and outlines directions for future research.

2606.10456 2026-06-10 cs.CR cs.AI 新提交

The Distributed Detectability Band Against Marginal-Preserving Attacks

针对边际保持攻击的分布式可检测性带

Zhang Qinqin, Gao Yuze

AI总结 针对AI监控的边际保持攻击,通过高斯Copula AR(1)构造将危害编码在时间相关性中,证明分布形状监控器失效而时间相关性监控器有效,形成非空可检测性带。

详情
Comments
10 pages, 11 figures
AI中文摘要

AI控制监控器对个体智能体动作进行评分以检测异常行为,但实际危害可能分布在许多看似良性的步骤中,每个步骤单独低于任何每步警报。我们使用高斯Copula AR(1)构造了一种边际保持、相关性编码的分布式破坏攻击:每步监控器评分边际完全等于良性,因此均值、最大值、top-k尾部及阈值监控器(监控器A)被构造性地击败,而危害被编码在时间相关结构中。我们围绕三个审稿人要求的门组织论文。(1)可实现性门:隐秘攻击在所有测试危害水平(最高3.0)下与良性的KS距离为0.013(实际为零),证实危害完全与每步边际解耦,且可实现性不受危害限制。(2)监控器A与B的调和:我们形式化证明,针对监控器A的评分边际构建的攻击,在另一种评分监控器B(相关性/序列族:CUSUM、SPRT、HMM-LR、游程检验、自相关、窗口逻辑回归)下仍保持边际保持,并将最坏情况声明限定在允许时间特征的评分函数上。(3)非空可检测性带:监控器A的AUC为0.52(随机);在相同1%假阳性率目标下,监控器B的AUC范围为0.79-0.97,且当危害分摊到更多步骤时,监控器A降至随机水平,而监控器B保持AUC约0.95。这些结果证明了非空可检测性带,并刻画了亚阈值破坏前沿:分布形状监控器被构造性击败;时间相关性监控器可检测但并非平凡最优。

英文摘要

AI-control monitors score individual agent actions to detect misbehavior, but real harm can be distributed across many benign-looking steps, each individually below any per-step alarm. We construct a marginal-preserving, correlation-encoded distributed-sabotage attack using a Gaussian-copula AR(1) construction: the per-step monitor-score marginal is held exactly equal to benign, so mean, max, top-k tail, and threshold monitors (Monitor A) are defeated by construction, while harm is encoded in the temporal correlation structure. We sequence the paper around three reviewer-mandated gates. (1) Realizability gate: the stealthy attack achieves KS-distance to benign of 0.013 (effectively zero) at all tested harm levels up to 3.0, confirming that harm is fully decoupled from the per-step marginal and realizability is not harm-limited. (2) Monitor-A-vs-B reconciliation: we show formally that the attack, built against Monitor A's score marginal, remains marginal-preserving under a different-score Monitor B (the correlation/sequence family: CUSUM, SPRT, HMM-LR, runs test, autocorrelation, windowed logistic), and scope worst-case claims to score functions that admit a temporal signature. (3) Non-empty detectability band: Monitor A achieves AUC 0.52 (chance); Monitor B spans AUC 0.79-0.97 at the same 1% FPR target, and as harm is amortized over more steps Monitor A collapses to chance while Monitor B holds at AUC ~0.95. These results demonstrate a non-empty detectability band and characterize the sub-threshold sabotage frontier: distribution-shape monitors fail by construction; temporal-correlation monitors can detect but are not trivially optimal.

2606.10450 2026-06-10 cs.CV cs.LG 新提交

Few-step Generative Models as Lossy Compression

少步生成模型作为有损压缩

Fuma Kimishima, Jinjia Zhou

AI总结 研究将少步生成模型(Rectified Flow、CTM、MeanFlow)用于反向信道编码框架进行有损压缩,通过参数化等效和局部高斯近似实现无需重训练的编解码,在低分辨率基准上减少编解码时间并提升低比特率下的真实性。

详情
AI中文摘要

DiffC 提供了一种重用预训练扩散模型进行有损压缩的原则性方法,但其编码和解码过程仍然缓慢,因为它们需要许多离散化的前向和反向步骤。我们研究少步生成模型——Rectified Flow、一致性轨迹模型(CTM)和 MeanFlow——是否可以在相同的反向信道编码(RCC)框架中作为编解码器使用。主要挑战在于 RCC 需要后验和共享分布参数,而这些模型并未显式参数化中间条件分布。对于 Rectified Flow 和 MeanFlow,我们利用速度参数化与扩散式去噪参数化之间的等价性来推导 RCC 所需的量。对于从 EDM 蒸馏得到的 CTM,我们采用 EDM 噪声参数化以及中间状态下发送方和共享分布的局部高斯近似。这产生了一个概念验证的概率公式,使得无需重新训练即可使用预训练的少步生成模型进行压缩。在低分辨率基准上,由此产生的编解码器减少了编码和解码时间,并在低比特率范围内提高了真实性。

英文摘要

DiffC provides a principled way to reuse pre-trained diffusion models for lossy compression, but its encoding and decoding procedures remain slow because they require many discretized forward and reverse steps. We study whether few-step generative models -- Rectified Flow, Consistency Trajectory Models (CTM), and MeanFlow -- can be cast as codecs within the same reverse channel coding (RCC) framework. The main challenge is that RCC requires posterior and shared distribution parameters, whereas these models do not explicitly parameterize intermediate conditional distributions. For Rectified Flow and MeanFlow, we use the equivalence between velocity parameterization and diffusion-style denoising parameterization to derive the quantities required by RCC. For CTM, which is distilled from EDM, we adopt the EDM noise parameterization together with local Gaussian approximations of the sender and shared distributions at intermediate states. This yields a proof-of-concept probabilistic formulation that enables compression with pre-trained few-step generative models without retraining. On low-resolution benchmarks, the resulting codecs reduce encoding and decoding time and improve realism in the low-bit-rate regime.

2606.10440 2026-06-10 cs.DC cs.LG cs.NI 新提交

ASTRA-sim 3.0: Next-Level Distributed Machine Learning Simulations via High-Fidelity GPU and Infrastructure Modeling

ASTRA-sim 3.0:通过高保真GPU和基础设施建模实现下一代分布式机器学习模拟

William Won, Jinsun Yoo, Tuan Ta, Moumita Dey, Andy Balogh, Pradosh Datta, Furkan Eris, Conor Green, Winston Liu, Changhai Man, Kingshuk Mandal, Amos Rai, Vinay Ramakrishnaiah, Ruchi Shah, David Sidler, Harsh Sikhwal, Hanjiang Wu, Tushar Krishna, Bradford M. Beckmann

AI总结 针对分布式机器学习中延迟敏感通信建模的不足,提出ASTRA-sim 3.0,通过细粒度缓存行级负载存储模拟和标准化基础设施表示InfraGraph,实现高保真模拟,支持优化集合算法、网络需求和GPU架构的设计空间探索。

详情
Comments
10 pages, 15 figures, one table
AI中文摘要

分布式机器学习是当今大规模人工智能应用的关键范式。随着模型推理成为重要用例,对延迟敏感的集合通信进行忠实建模从未如此重要。因此,如今必须高保真地捕获设备架构并建模控制和数据路径。拥有分布式机器学习基础设施的通用、详细表示也至关重要。我们重新审视了有前途的开源社区驱动模拟器:ASTRA-sim。在这项工作中,我们识别了当前ASTRA-sim模拟器的局限性,并为其增加了新功能。为此,我们通过标准化的基础设施表示实现了细粒度、高保真的模拟,开辟了新的设计空间探索机会。我们提出了缓存行大小的负载存储粒度的模拟,并带有详细的图形处理单元(GPU)执行模型,以平衡模拟的可扩展性和保真度。我们还引入了InfraGraph,一种标准化表示,用于详细捕获分布式机器学习网络基础设施。使用更新的ASTRA-sim 3.0模拟器,我们展示了设计优化集合算法、网络需求和GPU架构的有趣设计空间探索。

英文摘要

Distributed machine learning (ML) is a key paradigm for today's large-scale artificial intelligence applications. As model inference arises as an important use case, faithful modeling of latency-sensitive collective communication has never been more important. Capturing the device architecture and modeling control and data paths at high fidelity is therefore a necessity today. Having a common, detailed representation for distributed ML infrastructure is also crucial. We revisit the promising open-source, community-driven simulator: ASTRA-sim. In this work, we identify limitations of the current ASTRA-sim simulator and augment it with new features. To this end, we enable fine-grained, high-fidelity simulation with a standardized infrastructure representation, opening new design space exploration opportunities. We propose the simulation at cache-line-sized load-store granularity, with a detailed graphics processing unit (GPU) execution model, to balance simulation scalability and fidelity. We also introduce InfraGraph, a standardized representation to capture distributed ML network infrastructure in detail. Using the updated ASTRA-sim 3.0 simulator, we showcase interesting design space explorations for designing optimized collective algorithms, network requirements, and GPU architectures.

2606.10439 2026-06-10 cs.SD cs.CL eess.AS 新提交

Enhancing Multilingual LLM-based ASR with Mixture of Experts and Dynamic Downsampling

利用混合专家和动态下采样增强基于多语言大模型的语音识别

Guodong Lin, Ziqi Chen, Yuxiang Fu, Ke Li, Wei-Qiang Zhang

AI总结 提出基于投影器的LLM-ASR框架,通过混合专家架构提升跨语言适应性,并利用连续整合-触发机制实现动态下采样和模态对齐,实验表明该方法显著超越强基线模型。

详情
Journal ref
ICASSP (2026),18807-18811
Comments
Accepted by ICASSP 2026
AI中文摘要

大语言模型的快速发展为自动语音识别开辟了新前沿,使其有效集成成为一个关键且具有挑战性的研究方向。为此,本文提出了一种基于投影器的LLM-ASR框架,针对多语言泛化和模态对齐的关键挑战。我们的方法结合了混合专家架构以改善跨语言适应性,以及连续整合-触发机制用于动态下采样和模态对齐。实验结果表明,这些组件的组合带来了显著的性能提升,超越了强基线模型。所提出的方法朝着构建更准确、更鲁棒、更泛化的基于LLM的ASR系统迈出了一步。

英文摘要

The rapid progress of large language models (LLMs) has opened up a new frontier for automatic speech recognition (ASR), making their effective integration a critical and challenging research direction. To this end, this work proposes a projector-based LLM-ASR framework targeting the key challenges of multilingual generalization and modality alignment. Our approach incorporates a Mixture of Experts (MoE) architecture to improve cross-lingual adaptability, and a Continuous Integrate-and-Fire (CIF) mechanism for dynamic downsampling and modality alignment. Experimental results show that the combination of these components yields substantial performance improvements, surpassing strong baseline models. The proposed method represents a step toward building more accurate, robust, and generalizable LLM-based ASR systems.

2606.10412 2026-06-10 cs.AI 新提交

A Unified Multi-Modal Framework for Intelligent Financial Systems: Integrating Reinforcement Learning, High-Frequency Trading, and Game-Theoretic Approaches with Cross-Modal Sentiment Analysis

面向智能金融系统的统一多模态框架:整合强化学习、高频交易和博弈论方法与跨模态情感分析

Fanrong Liu, Zhang Yuwei, Mingni Luo

AI总结 提出统一框架整合PPO、高频预测、上下文学习、博弈论和跨模态情感分析,在多个金融任务上平均提升20%以上性能。

详情
AI中文摘要

金融科技的快速发展要求能够同时处理多领域多样化挑战的复杂人工智能系统。本文提出了一个开创性的统一框架,无缝整合了用于机器人顾问系统的近端策略优化、用于高频交易的先进时间序列预测模型、用于动态投资顾问的上下文学习机制、用于竞争性银行场景的博弈论方法以及用于跨模态金融情感分析的统一嵌入。我们的综合框架解决了现有文献中这些技术孤立发展、未能利用其协同潜力的关键空白。通过在多个金融数据集和现实场景中的广泛实验,我们证明了集成方法相比专门的单领域系统实现了更优的性能。具体而言,我们的框架在投资组合优化指标上提升了23.7%,将高频交易的预测误差降低了31.2%,将投资推荐准确率提高了18.9%,通过纳什均衡收敛速度增加27.4%优化了竞争性银行策略,并通过跨模态融合将情感分析准确率提高了15.6%。我们的工作理论基础为集成优化问题建立了收敛保证,而实证结果验证了其在多样化金融机构中的实际适用性。这项研究不仅推进了金融AI的最新水平,还为开发能够适应现代金融市场复杂互联本质的综合智能系统提供了蓝图。

英文摘要

The rapid evolution of financial technology demands sophisticated artificial intelligence systems capable of handling diverse challenges across multiple domains simultaneously. This paper presents a groundbreaking unified framework that seamlessly integrates Proximal Policy Optimization for robo-advisory systems, advanced time-series prediction models for high-frequency trading, in-context learning mechanisms for dynamic investment advisory, game-theoretic approaches for competitive banking scenarios, and unified embeddings for cross-modal financial sentiment analysis. Our comprehensive framework addresses the critical gap in existing literature where these technologies have been developed in isolation, failing to leverage their synergistic potential. Through extensive experimentation across multiple financial datasets and real-world scenarios, we demonstrate that our integrated approach achieves superior performance compared to specialized single-domain systems. Specifically, our framework shows a 23.7% improvement in portfolio optimization metrics, reduces prediction error in high-frequency trading by 31.2%, enhances investment recommendation accuracy by 18.9%, optimizes competitive banking strategies with a 27.4% increase in Nash equilibrium convergence speed, and improves sentiment analysis accuracy by 15.6% through cross-modal fusion. The theoretical foundation of our work establishes convergence guarantees for the integrated optimization problem, while our empirical results validate the practical applicability across diverse financial institutions. This research not only advances the state-of-the-art in financial AI but also provides a blueprint for developing comprehensive intelligent systems that can adapt to the complex, interconnected nature of modern financial markets.