arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 2075
专题追踪
2601.09051 2026-05-14 cs.LG

Deep Incomplete Multi-View Clustering via Hierarchical Imputation and Alignment

Yiming Du, Ziyu Wang, Jian Li, Rui Ning, Lusi Li

发表机构 * Department of Computer Science, Old Dominion University(旧 Dominion 大学计算机科学系)

AI总结 该研究针对不完整多视角聚类(IMVC)问题,旨在从部分观测的多视角数据中发现共享的聚类结构。为解决缺失数据的准确填补与视图间语义一致性保持的挑战,提出了一种名为DIMVC-HIA的深度框架,结合分层填补与对齐机制,包括视图特异性自编码器、分层填补模块、基于能量的语义对齐模块以及对比分配对齐模块。实验表明,该方法在不同缺失程度下均表现出优越的聚类性能。

Comments Accepted by AAAI 2026

Journal ref Proceedings of the AAAI Conference on Artificial Intelligence, 40(25):20941-20949, 2026

详情
英文摘要

Incomplete multi-view clustering (IMVC) aims to discover shared cluster structures from multi-view data with partial observations. The core challenges lie in accurately imputing missing views without introducing bias, while maintaining semantic consistency across views and compactness within clusters. To address these challenges, we propose DIMVC-HIA, a novel deep IMVC framework that integrates hierarchical imputation and alignment with four key components: (1) view-specific autoencoders for latent feature extraction, coupled with a view-shared clustering predictor to produce soft cluster assignments; (2) a hierarchical imputation module that first estimates missing cluster assignments based on cross-view contrastive similarity, and then reconstructs missing features using intra-view, intra-cluster statistics; (3) an energy-based semantic alignment module, which promotes intra-cluster compactness by minimizing energy variance around low-energy cluster anchors; and (4) a contrastive assignment alignment module, which enhances cross-view consistency and encourages confident, well-separated cluster predictions. Experiments on benchmarks demonstrate that our framework achieves superior performance under varying levels of missingness.

2601.06147 2026-05-14 cs.LG cs.CL stat.ML

LLM Flow Processes for Text-Conditioned Regression

Felix Biggs, Samuel Willis

发表机构 * Secondmind Wayve

AI总结 本文研究了在文本条件回归任务中如何有效利用预训练大语言模型(LLM)进行预测的问题。针对LLM在短序列预测中存在误差累积、计算密集且难以并行的问题,作者提出将LLM的边际预测密度与一个轻量级扩散神经过程结合,以提升预测的校准性与局部一致性。该方法还引入了一种无需梯度且非蒙特卡洛的采样方法,能够从分数模型与专家密度的乘积中高效采样,具有独立的理论与应用价值。

详情
英文摘要

Recent work has demonstrated surprisingly good performance of pre-trained LLMs on regression tasks (for example, time-series prediction), with the ability to incorporate expert prior knowledge and the information contained in textual metadata. However we observe major error cascades even in short sequences < ~100 points; these models are also computationally intensive and difficult to parallelise. Marginal LLM predictions do not suffer this issue and are trivially parallelised, but can predict over-broad densities. To address this, we propose combining these densities with a lightweight (diffusion-based) neural process. We show that this combination leads to better-calibrated predictions overall, outputs locally consistent trajectories, and leads to text-conditioned function space selection in the meta-learner. As part of this work we propose a gradient-free (and non-Monte Carlo) method for sampling from a product-of-experts of a score model and an 'expert' (here the LLM predictive densities). We believe this general method is of independent interest as it is applicable whenever an expert can be convolved with a Gaussian in closed form.

2601.01860 2026-05-14 cs.LG quant-ph

High-Order Epistasis Detection Using Factorization Machine with Quadratic Optimization Annealing and MDR-Based Evaluation

Shuta Kikuchi, Shu Tanaka

发表机构 * Graduate School of Science and Technology(科学与技术研究生院) Keio University Sustainable Quantum Artificial Intelligence Center (KSQAIC)(Keio大学可持续量子人工智能中心) Keio University(Keio大学) Department of Applied Physics and Physico-Informatics(应用物理与物理信息学系) Human Biology-Microbiome-Quantum Research Center (WPI-Bio2Q)(人类生物学-微生物群-量子研究中心(WPI-Bio2Q))

AI总结 本文研究了高阶上位效应检测这一在遗传关联研究中具有挑战性的问题,提出了基于因子分解机与二次优化退火(FMQA)的高效检测方法。该方法将上位效应检测建模为黑盒优化问题,利用MDR计算的分类错误率作为目标函数,有效避免了传统方法在高阶交互搜索中的计算瓶颈。实验表明,该方法在多种设定下能够高效准确地识别预设的高阶上位效应,具有较高的检测性能和计算效率。

Comments 6 pages, 2 figures

Journal ref 2026 International Conference on Quantum Communications, Networking, and Computing (QCNC), pp. 924-929

详情
英文摘要

Detecting high-order epistasis is a fundamental challenge in genetic association studies due to the combinatorial explosion of candidate locus combinations. Although multifactor dimensionality reduction (MDR) is a widely used method for evaluating epistasis, exhaustive MDR-based searches become computationally infeasible as the number of loci or the interaction order increases. In this paper, we define the epistasis detection problem as a black-box optimization problem and solve it with a factorization machine with quadratic-optimization annealing (FMQA). We propose an efficient epistasis detection method based on FMQA, in which the classification error rate (CER) computed by MDR is used as a black-box objective function. Experimental evaluations were conducted using simulated case-control datasets with predefined high-order epistasis. The results demonstrate that the proposed method successfully identified ground-truth epistasis across various interaction orders and the numbers of genetic loci within a limited number of iterations. These results indicate that the proposed method is effective and computationally efficient for high-order epistasis detection.

2601.00417 2026-05-14 cs.LG cs.AI cs.CL cs.CV

Deep Delta Learning

Yifan Zhang, Yifeng Liu, Mengdi Wang, Quanquan Gu

发表机构 * Princeton University(普林斯顿大学) University of California Los Angeles(加州大学洛杉矶分校)

AI总结 本文提出了一种名为Deep Delta Learning(DDL)的残差更新机制,用于改进Transformer模型中的残差流。与传统的加法累积方式不同,DDL允许每一层选择性地重写残差内容,通过学习方向读取当前状态,并与目标值进行比较,再沿相同方向进行门控修正。实验表明,DDL在语言模型中有效提升了残差流的管理能力,优于传统的残差加法方式。

Comments Project Page: https://github.com/yifanzhang-pro/deep-delta-learning

详情
英文摘要

Transformer residual streams evolve by additive accumulation: each layer appends a feature update to a shared hidden state, but has no direct mechanism for replacing content that has become obsolete or conflicting. We introduce Deep Delta Learning (DDL), a residual update rule that preserves the identity path while giving every layer the ability to selectively rewrite residual content. DDL reads the current state along a learned direction, compares it with a learned target value, and writes back a gated correction along the same direction. When the gate is closed, the update reduces to the identity; when the gate is fully open, the selected component is overwritten, yielding a depth-wise delta-rule generalization of standard residual addition. We integrate DDL in decoder-only language models with both scalar and expanded residual states, while keeping attention and MLP sublayers at the original compute width. Controlled pretraining and downstream evaluations show that residual rewrite operations improve language modeling quality relative to pure additive accumulation introduced in ResNet, suggesting that a learned delta-rule update is an effective mechanism for managing Transformer residual streams.

2512.18951 2026-05-14 cs.LG

Benchmarking Attribute Discrimination in Infant-Scale Vision-Language Models

Patrick Batsell, Satoshi Tsutsui, Bihan Wen

发表机构 * Rice University(里士大学) ROSE Lab(ROSE实验室) School of EEE, Nanyang Technological University(南洋理工大学电子工程学院)

AI总结 本研究探讨了婴儿尺度视觉-语言模型在细粒度视觉属性(如颜色、大小、纹理)辨别方面的能力,提出了一个受控基准测试,通过合成渲染技术分离属性与物体身份。研究对比了婴儿训练模型与大规模网络训练模型在图像仅原型测试和图文联合测试中的表现,发现婴儿训练模型在视觉大小和纹理辨别上表现较好,但在颜色辨别上较弱,且在图文联合任务中难以准确关联颜色信息,而大规模模型则在颜色理解上表现更优。

详情
英文摘要

Infants learn not only object categories but also fine-grained visual attributes such as color, size, and texture from limited experience. Prior infant-scale vision--language models have mainly been evaluated on object recognition, leaving open whether they support within-class attribute discrimination. We introduce a controlled benchmark that varies color, size, and texture across 67 everyday object classes using synthetic rendering to decouple attribute values from object identity. We evaluate infant-trained models (CVCL and an infant-trained DINO baseline) against web-scale and ImageNet models (CLIP, SigLIP, ResNeXt) under two complementary settings: an image-only prototype test and a text--vision test with attribute--object prompts. We find a dissociation between visual and linguistic attribute information: infant-trained models form strong visual representations for size and discriminate texture comparably to other models, but perform poorly on visual color discrimination, and in the text--vision setting they struggle to ground color and show only modest size grounding. In contrast, web-trained vision--language models strongly ground color from text while exhibiting weaker visual size discrimination.

2512.16960 2026-05-14 cs.LG quant-ph

QSMOTE-PGM/kPGM: QSMOTE Based PGM and kPGM for Imbalanced Dataset Classification

Bikash K. Behera, Giuseppe Sergioli, Roberto Giuntini

发表机构 * Università degli Studi di Cagliari(卡利亚里大学) Technische Universität München(慕尼黑技术大学) Institute for Advanced Study (IAS)(高级研究院)

AI总结 本文研究了将量子合成少数过采样技术(QSMOTE)与两种量子启发分类器(PGM和kPGM)结合,以提升对不平衡数据集的分类性能。提出三种基于不同量子启发机制的QSMOTE变体,并在不同量子编码策略下对PGM和kPGM进行了理论与实验对比。实验结果表明,所提方法在召回率和平衡F1分数上优于经典随机森林模型,其中PGM在立体编码下表现最优,而kPGM则展现出更稳定的性能。

Comments 30 pages, 10 figures

详情
英文摘要

Quantum-inspired machine learning (QiML) employs mathematical principles from quantum theory, such as Hilbert-space representations and quantum state discrimination, to enhance classical learning algorithms. In this work, we investigate the integration of Quantum Synthetic Minority Oversampling Technique (QSMOTE) variants with two quantum-inspired classifiers: the Pretty Good Measurement (PGM) classifier and the kernelized Pretty Good Measurement (KPGM) classifier. We propose and analyze three QSMOTE variants, namely KNN-based, Fidelity-based, and Margin-based QSMOTE, designed to improve minority-class representation in imbalanced datasets through quantum-inspired similarity and sampling mechanisms. A unified theoretical and empirical comparison of PGM and KPGM is presented under amplitude and stereo encoding strategies with multiple quantum copies. Experimental evaluations on the Telco Customer Churn dataset demonstrate that the proposed quantum-inspired approaches consistently outperform a classical Random Forest baseline, particularly in terms of recall and balanced F1-score. Among all configurations, PGM with stereo encoding and n_{copies}=2 achieves the best performance with an accuracy of 0.8512 and an F1-score of 0.8234, while KPGM exhibits competitive and more stable behavior across different QSMOTE variants, reaching accuracies of 0.8511 under stereo encoding and 0.8483 under amplitude encoding. The results further show that increasing the number of quantum copies systematically improves classification performance, especially for minority-class detection. This work highlights the effectiveness of combining quantum-inspired oversampling and classification strategies for imbalanced learning, while providing practical insights into the complementary strengths of measurement-based and kernel-based quantum-inspired machine learning frameworks.

2512.13399 2026-05-14 cs.AI cs.CL

Differentiable Evolutionary Reinforcement Learning

Sitao Cheng, Tianle Li, Xuhan Huang, Xunjian Yin, Difan Zou

发表机构 * Department of XXX, University of YYY, Location, Country(XXX系,YYY大学,地点,国家) School of ZZZ, Institute of WWW, Location, Country(ZZZ学院,WWW研究所,地点,国家)

AI总结 本文研究了强化学习中如何设计有效的奖励信号这一核心问题,提出了一种可微分的进化强化学习框架(DERL),通过引入元优化器,将奖励函数的结构优化过程与策略学习相结合。该方法利用策略梯度对内层策略的验证性能进行反馈,从而实现对奖励结构的渐进式优化,提升了系统在复杂任务中的自主学习与泛化能力。实验表明,DERL在多个推理领域均取得了优于传统非可微方法的性能,尤其在分布外泛化方面表现突出。

Comments Work in Progress. We release our code and model at https://github.com/sitaocheng/DERL

详情
英文摘要

Crafting effective reward signals remains a central challenge in Reinforcement Learning (RL), especially for complex reasoning tasks. Existing automated reward optimization methods typically rely on derivative-free search heuristics that treat the reward function as a black box, failing to exploit the causal dynamics between reward structure modifications and policy performance. We introduce Differentiable Evolutionary Reinforcement Learning (DERL), a bi-level framework for the autonomous discovery of optimal reward structures. DERL employs a Meta-Optimizer that evolves a reward function through the composition of structured atomic primitives to guide an inner-loop policy. Unlike prior black-box methods, DERL introduces differentiability into the meta-optimization process by updating the Meta-Optimizer using policy gradients derived from inner-loop validation performance. This allows for the progressive learning of a "meta-gradient" for task success, providing the system with dense, actionable feedback. We validate DERL across diverse reasoning domains: embodied agent (ALFWorld), scientific simulation (ScienceWorld), and mathematical reasoning (GSM8K, MATH). Results show that DERL achieves state-of-the-art performance on agent benchmarks, substantially outperforming non-differentiable baselines-especially in out-of-distribution generalization. Trajectory analyses confirm that DERL captures the intrinsic causal structure of tasks, enabling fully autonomous, self-improving agent alignment.

2512.10857 2026-05-14 cs.LG cs.AI stat.ML

Generative Modeling from Black-box Corruptions via Self-Consistent Stochastic Interpolants

Chirag Modi, Jiequn Han, Eric Vanden-Eijnden, Joan Bruna

发表机构 * New York University(纽约大学) Flatiron Institute(Flatiron研究所) Machine Learning Lab, Capital Fund Management(资本基金管理有限公司机器学习实验室)

AI总结 本文研究了如何从受黑盒噪声干扰的数据中构建生成模型的问题。作者提出了一种基于随机插值的自洽方法(SCSI),通过迭代更新受污染数据与干净数据之间的映射,仅依赖于受污染数据集和对噪声通道的黑盒访问,从而实现对原始数据分布的逆向建模。该方法在计算效率、灵活性和理论保证方面具有优势,并在图像处理和科学重建等任务中表现出优越性能。

Comments Accepted at ICLR 2026

详情
英文摘要

Transport-based methods have emerged as a leading paradigm for building generative models from large, clean datasets. However, in many scientific and engineering domains, clean data are often unavailable: instead, we only observe measurements corrupted through a noisy, ill-conditioned channel. A generative model for the original data thus requires solving an inverse problem at the level of distributions. In this work, we introduce a novel approach to this task based on Stochastic Interpolants: we iteratively update a transport map between corrupted and clean data samples using only access to the corrupted dataset as well as black box access to the corruption channel. Under appropriate conditions, this iterative procedure converges towards a self-consistent transport map that effectively inverts the corruption channel, thus enabling a generative model for the clean data. We refer to the resulting method as the self-consistent stochastic interpolant (SCSI). It (i) is computationally efficient compared to variational alternatives, (ii) highly flexible, handling arbitrary nonlinear forward models with only black-box access, and (iii) enjoys theoretical guarantees. We demonstrate superior performance on inverse problems in natural image processing and scientific reconstruction, and establish convergence guarantees of the scheme under appropriate assumptions. Our source code is publicly available at https://github.com/modichirag/SCSI

2512.09675 2026-05-14 cs.CL

d-TreeRPO: Towards More Reliable Policy Optimization for Diffusion Language Models

Leyi Pan, Shuchang Tao, Yunpeng Zhai, Zheyu Fu, Liancheng Fang, Minghua He, Lingzhe Zhang, Zhaoyang Liu, Bolin Ding, Aiwei Liu, Lijie Wen

发表机构 * Tsinghua University(清华大学) Tongyi Lab(通义实验室) University of Illinois at Chicago(伊利诺伊大学香槟分校) Peking University(北京大学)

AI总结 该论文提出了一种名为d-TreeRPO的可靠策略优化框架,旨在提升扩散语言模型在强化学习中的推理能力。针对现有方法存在的奖励稀疏性和概率估计偏差两大问题,d-TreeRPO通过树状展开和基于可验证结果的自底向上优势计算,提供了更精细和可验证的奖励信号。此外,该方法引入了时间调度的自蒸馏损失,提高了模型预测置信度,从而增强了概率估计的准确性,实验表明其在多个推理任务中均取得了显著提升。

Comments ACL 2026 Main

详情
英文摘要

Reinforcement learning (RL) is pivotal for enhancing the reasoning capabilities of diffusion large language models (dLLMs). However, existing dLLM policy optimization methods suffer from two critical reliability bottlenecks: (1) reward sparsity, arising from coarse or unverifiable signals that impede accurate advantage calculation; and (2) their probability estimates do not account for the gap to the unbiased expectation over all decoding orders, which are intractable to compute. To mitigate these issues, we propose d-TreeRPO, a reliable RL framework for dLLMs that leverages tree-structured rollouts and bottom-up advantage computation based on verifiable outcome rewards to provide fine-grained and verifiable step-wise reward signals. Furthermore, we provide a theoretical proof demonstrating that increasing prediction confidence effectively minimizes the gap between unbiased expected prediction probabilities and its single-step forward pass estimate. Guided by this analysis, we introduce a time-scheduled self-distillation loss during training that enhances prediction confidence in later training stages, thereby enabling more accurate probability estimation and better performance. Experiments demonstrate that d-TreeRPO outperforms existing baselines and achieves significant improvements across multiple reasoning benchmarks. Specifically, it achieves +86.2% on Sudoku, +51.6% on Countdown, +4.5% on GSM8K, and +5.3% on Math500 compared to the base model.

2512.08411 2026-05-14 cs.AI cs.RO

Prismatic World Model: Learning Compositional Dynamics for Planning in Hybrid Systems

Mingwei Li, Xiaoyuan Zhang, Chengwei Yang, Zilong Zheng, Yaodong Yang

发表机构 * Beijing Institute of Technology(北京理工大学) Peking University(北京大学) NLCo Lab, Beijing Institute for General Artificial Intelligence(北大师范学院实验室,北京人工智能研究院)

AI总结 在机器人规划任务中,物理系统的混合动态特性(如连续运动与离散事件的交替)给基于模型的规划带来了挑战。本文提出了一种结构化的世界模型PRISM-WM,通过将复杂的混合动态分解为可组合的基本单元,有效解决了传统模型对不同动态模式过度平滑的问题。该模型采用基于上下文的专家混合框架,结合隐式模式识别与专家多样性约束,提升了长期预测的准确性,并在多个连续控制任务中表现出优越的轨迹优化性能。

详情
英文摘要

Model-based planning in robotic domains is challenged by the hybrid nature of physical dynamics, where continuous motion is punctuated by discrete events such as contacts and impacts. Conventional latent world models typically employ monolithic neural networks that enforce global continuity, which over-smooths distinct dynamic modes (e.g., sticking vs. sliding, flight vs. stance). For a planner, this smoothing results in compounding errors during long-horizon lookaheads, rendering the search process unreliable at physical boundaries. To address this, we introduce the Prismatic World Model (PRISM-WM), a structured architecture designed to decompose complex hybrid dynamics into composable primitives. PRISM-WM uses a context-aware Mixture-of-Experts (MoE) framework where a gating mechanism implicitly identifies the current physical mode, and specialized experts predict the associated transition dynamics. We further introduce a latent orthogonalization objective to ensure expert diversity, preventing mode collapse. By modeling the mode transitions in system dynamics, PRISM-WM reduces rollout drift. Experiments on continuous control benchmarks, including high-dimensional humanoids and multi-task settings, demonstrate that PRISM-WM provides a high-fidelity substrate for trajectory optimization algorithms (e.g., TD-MPC), indicating its potential as a foundational model for model-based agents.

2512.07775 2026-05-14 cs.RO

OptMap: Geometric Map Distillation via Submodular Maximization

David Thorne, Nathan Chan, Christa S. Robison, Philip R. Osteen, Brett T. Lopez

发表机构 * University of California, Los Angeles(加州大学洛杉矶分校) DEVCOM Army Research Laboratory (ARL)(国防部陆军研究实验室(ARL))

AI总结 本文提出了一种名为 OptMap 的几何地图蒸馏算法,旨在解决自主机器人在处理大量激光雷达数据时面临的计算和存储挑战。该方法通过子模函数最大化,在多项式时间内生成具有应用特性的精简地图,有效降低了计算开销并保持了信息完整性。研究通过设计新的子模奖励函数和动态重排序流式算法,提升了地图生成的质量与效率,并在长时间映射任务中验证了其优越性,展示了其在在线几何变化检测等实际场景中的应用价值。

详情
英文摘要

Autonomous robots rely on geometric maps to inform a diverse set of perception and decision-making algorithms. As autonomy requires reasoning and planning on multiple scales, each algorithm may require a different map for optimal performance. LiDAR sensors generate an abundance of geometric data (up to 50 MB per second) to satisfy these diverse requirements. However, the point-based operations required to process perception data are both memory and computationally expensive. Such operations can be bypassed via learned representations that encode similarity, but selecting informative, size-constrained maps remains an NP-hard combinatorial problem. In this work we present OptMap: a geometric map distillation algorithm which achieves online, application-specific map generation via multiple theoretical and algorithmic innovations. A central feature is the maximization of set functions that exhibit diminishing returns, i.e., submodularity, using polynomial-time algorithms with provably near-optimal solutions. We formulate a novel submodular reward function which quantifies informativeness, reduces input set sizes, and minimizes solution bias. Further, we propose a dynamically reordered streaming submodular algorithm which improves empirical solution quality and addresses input order bias via an online approximation of the value of all scans. Testing was conducted on open-source and custom datasets with an emphasis on long-duration mapping sessions, highlighting OptMap's minimal computation requirements. OptMap's practical value is then illustrated through its application to online geometric change detection. Open-source ROS1 and ROS2 packages are available and can be used alongside any LiDAR odometry algorithm.

2512.07112 2026-05-14 cs.LG cs.AI

FOAM: Blocked State Folding for Memory-Efficient LLM Training

Ziqing Wen, Jiahuan Wang, Ping Luo, Dongsheng Li, Tao Sun

发表机构 * National Key Laboratory of Parallel and Distributed Computing(并行与分布式计算国家重点实验室) College of Computer Science and Technology(计算机科学与技术学院) National University of Defense Technology(国防科技大学)

AI总结 本文提出了一种名为FOAM的优化方法,用于提高大语言模型训练过程中的内存效率。该方法通过分块计算梯度均值来压缩优化器状态,并引入残差修正以恢复信息损失,从而在保持模型性能的同时大幅降低内存占用。实验表明,FOAM能够在不牺牲收敛速度的前提下,减少多达90%的优化器状态内存开销,并且兼容其他内存高效优化器,性能优于现有方法。

详情
英文摘要

Large language models (LLMs) have demonstrated remarkable performance due to their large parameter counts and extensive training data. However, their scale leads to significant memory bottlenecks during training, especially when using memory-intensive optimizers like Adam. Existing memory-efficient approaches often rely on techniques such as singular value decomposition (SVD), projections, or weight freezing, which can introduce substantial computational overhead, require additional memory for projections, or degrade model performance. In this paper, we propose Folded Optimizer with Approximate Moment (FOAM), a method that compresses optimizer states by computing block-wise gradient means and incorporates a residual correction to recover lost information. Theoretically, FOAM achieves convergence rates equivalent to vanilla Adam under standard non-convex optimization settings. Empirically, FOAM eliminates up to 90\% of the memory overhead of optimizer states and accelerates convergence. Furthermore, FOAM is compatible with other memory-efficient optimizers, delivering performance and throughput that match or surpass both full-rank and existing memory-efficient baselines. Code is available at https://github.com/zqOuO/FOAM.

2512.07091 2026-05-14 cs.RO

SCU-Hand with Integrated Single-Sheet Valve: A Funnel-Shaped Robotic Hand for Milligram-Scale Powder Handling

Tomoya Takahashi, Yusaku Nakajima, Cristian Camilo Beltran-Hernandez, Yuki Kuroda, Kazutoshi Tanaka, Masashi Hamaya, Kanta Ono, Yoshitaka Ushiku

发表机构 * OMRON SINIC X Corporation(OMRON SINIC X公司) The University of Osaka(大阪大学)

AI总结 该研究提出了一种集成单片阀的漏斗形软体机械手SCU-Hand-SV,用于实现毫克级粉末的精确操控。通过在锥形结构顶端加入可控阀门,结合基于粉末流动模型的反馈控制系统,该机械手能够实现高精度的粉末分装与称量。实验表明,其在多种粉末材料上的称量误差控制在±2毫克以内,显著提升了实验室自动化中粉末处理的效率与灵活性。

Comments 8 pages, 8 figures

详情
英文摘要

Laboratory Automation (LA) has the potential to accelerate solid-state materials discovery by enabling continuous robotic operation without human intervention. While robotic systems have been developed for tasks such as powder grinding and X-ray diffraction (XRD) analysis, fully automating powder handling at the milligram scale remains a significant challenge due to the complex flow dynamics of powders and the diversity of laboratory tasks. To address this challenge, this study proposes the SCU-Hand-SV (Soft Conical Universal Robotic Hand with Single-sheet Valve), which preserves the softness and conical sheet designs in prior work while incorporating a controllable valve at the cone apex to enable precise, incremental dispensing of milligram-scale powder quantities. The SCU-Hand-SV is integrated with an external balance through a feedback control system based on a model of powder flow and online parameter identification. Experimental evaluations with glass beads, monosodium glutamate, and titanium dioxide demonstrated that 80% of the trials achieved an error within -2 mg to +2 mg, and the maximum error observed was approximately 20 mg across a target range of 20 mg to 3 g. In addition, by incorporating flow prediction models commonly used for hoppers and performing online parameter identification, the system is able to adapt to variations in powder dynamics. Compared to direct PID control, the proposed model-based control significantly improved both accuracy and convergence speed. These results highlight the potential of the proposed system to enable efficient and flexible powder weighing, with scalability toward larger quantities and applicability to a broad range of laboratory automation tasks.

2512.02764 2026-05-14 cs.CL

PEFT-Factory: Unified Parameter-Efficient Fine-Tuning of Autoregressive Large Language Models

Robert Belanec, Ivan Srba, Maria Bielikova

发表机构 * Faculty of Information Technology, Brno University of Technology(布拉格技术学院,布拉格技术大学) Kempelen Institute of Intelligent Technologies(智能技术研究所)

AI总结 本文提出PEFT-Factory,一个统一的参数高效微调框架,用于对自回归大语言模型进行高效微调。该框架支持多种现成和自定义的PEFT方法,并集成了多个分类和文本生成数据集以及相应的评估指标,提升了方法的可复现性和基准测试能力。PEFT-Factory基于流行的LLaMA-Factory框架开发,提供了一个易于使用、可控且稳定的实验环境。

详情
英文摘要

Parameter-Efficient Fine-Tuning (PEFT) methods address the increasing size of Large Language Models (LLMs). Currently, many newly introduced PEFT methods are challenging to replicate, deploy, or compare with one another. To address this, we introduce PEFT-Factory, a unified framework for efficient fine-tuning LLMs using both off-the-shelf and custom PEFT methods. While its modular design supports extensibility, it natively provides a representative set of 19 PEFT methods, 27 classification and text generation datasets addressing 12 tasks, and both standard and PEFT-specific evaluation metrics. As a result, PEFT-Factory provides a ready-to-use, controlled, and stable environment, improving replicability and benchmarking of PEFT methods. PEFT-Factory is a downstream framework that originates from the popular LLaMA-Factory, and is publicly available at https://github.com/kinit-sk/PEFT-Factory.

2512.01707 2026-05-14 cs.CV cs.AI cs.CL

StreamGaze: Gaze-Guided Temporal Reasoning and Proactive Understanding in Streaming Videos

Daeun Lee, Subhojyoti Mukherjee, Branislav Kveton, Ryan A. Rossi, Viet Dac Lai, Seunghyun Yoon, Trung Bui, Franck Dernoncourt, Mohit Bansal

发表机构 * University of North Carolina, Chapel Hill(北卡罗来纳大学教堂山分校) Adobe Research(Adobe研究院)

AI总结 StreamGaze 是一个用于评估多模态大语言模型在流式视频中利用人类注视信号进行时间推理和主动理解能力的全新基准。该研究通过引入基于注视引导的过去、当前和主动推理任务,全面评估模型在实时处理视频流并预测用户意图方面的能力。研究构建了一个结合注视轨迹与视频内容的问答生成管道,生成具有时空语义的问答对,并揭示了当前模型在基于注视的时序推理和主动预测方面仍存在明显不足。

Comments Accepted to CVPR 2026 with strong scores (5/5/5) but desk-rejected after the camera-ready due to not completing all reviewing duties

详情
英文摘要

Streaming video understanding requires models not only to process temporally incoming frames, but also to anticipate user intention for realistic applications such as Augmented Reality (AR) glasses. While prior streaming benchmarks evaluate temporal reasoning, none measure whether Multimodal Large Language Models (MLLMs) can interpret or leverage human gaze signals within a streaming setting. To fill this gap, we introduce StreamGaze, the first benchmark designed to evaluate how effectively MLLMs utilize gaze for temporal and proactive reasoning in streaming videos. StreamGaze introduces gaze-guided past, present, and proactive tasks that comprehensively assess streaming video understanding. These tasks evaluate whether models can use real-time gaze signals to follow shifting attention and infer user intentions based only on past and currently observed frames. To build StreamGaze, we develop a gaze-video Question Answering (QA) generation pipeline that aligns egocentric videos with raw gaze trajectories through fixation extraction, region-specific visual prompting, and scanpath construction. This pipeline produces spatio-temporally grounded QA pairs that reflect human perceptual dynamics. Across all StreamGaze tasks, we observe substantial performance gaps between state-of-the-art MLLMs and human performance, highlighting key limitations in gaze-based temporal reasoning, intention modeling, and proactive prediction. We further provide detailed analyses of gaze prompting strategies, reasoning behaviors, and task-specific failure modes, offering insights into current limitations and directions for future research. All data and code are publicly available to support continued research in gaze-guided streaming video understanding.

2512.01242 2026-05-14 cs.CV cs.AI cs.CL

When Diffusion Breaks Constraints: Sequential Autoregressive Generation with RL and MCTS

Zirui Zhao, Boye Niu, Harold Soh, David Hsu, Wee Sun Lee

发表机构 * Salesforce AI Research(Salesforce人工智能研究) University of Sydney(悉尼大学) National University of Singapore(新加坡国立大学)

AI总结 该论文研究了扩散模型在受约束生成任务中的局限性,例如多机器人路径规划、分子生成和场景合成等,这些问题需要满足严格的几何或物理约束。为了解决这一问题,作者提出了一种基于强化学习和蒙特卡洛树搜索的顺序自回归生成方法,将约束生成问题转化为离散的序列生成任务,从而更有效地满足复杂的约束条件。实验表明,该方法在可行性与任务成功率方面优于传统扩散模型,为解决此类受限生成问题提供了新的思路。

详情
英文摘要

Data-driven generative models excel in language and vision, but diffusion models often fail in constrained planning and design tasks, exhibiting severe constraint violations in engineering inverse design, molecular generation, multi-robot planning, and floorplan/scene synthesis even with projection or guidance. Such tasks combine hard-to-specify semantic goals with strict geometric or physical constraints (e.g., non-overlap, connectivity), yielding feasible solutions that lie on low-dimensional, small, and sometimes disconnected regions of the output space. This paper studies the failure mode through tangram generation from language, where seven fixed shapes must form a text-described silhouette while remaining connected and non-overlapping, and a simplified rectangle composition task with a learned bounding-box constraint. We find diffusion models struggle to satisfy constraints, consistent with difficulty generating samples near low-dimensional submanifolds. Motivated by locally feasible reparameterizations, we reformulate constrained generation as discrete autoregressive sequential generation. Reinforcement learning improves feasibility and task success, and Monte Carlo tree search quantifies the value of look-ahead when feasible regions shrink. Overall, the empirical, theoretical, and prior-work evidence points to a structural limitation of continuous density matching on this class of constrained-generation problems, and suggests sequential constraint-aware generation as a promising alternative.

2511.21285 2026-05-14 cs.CL

PEFT-Bench: A Parameter-Efficient Fine-Tuning Methods Benchmark

Robert Belanec, Branislav Pecher, Ivan Srba, Maria Bielikova

发表机构 * Faculty of Information Technology, Brno University of Technology(布拉格技术学院,布拉格技术大学) Kempelen Institute of Intelligent Technologies(智能技术研究所)

AI总结 尽管大型语言模型在许多任务中表现出色,但其庞大的规模带来了高昂的计算和环境成本,限制了其应用。为了解决这一问题,参数高效微调(PEFT)方法通过减少可训练参数的数量来保持良好的下游性能。本文提出PEFT-Bench,一个用于评估多种PEFT方法的统一端到端基准,覆盖27个NLP数据集和7种PEFT方法,并引入PEFT软成本惩罚(PSCP)指标,综合考虑可训练参数、推理速度和训练内存使用等因素,以更全面地评估不同方法的效率与效果。

详情
英文摘要

Despite the state-of-the-art performance of Large Language Models (LLMs) achieved on many tasks, their massive scale often leads to high computational and environmental costs, limiting their accessibility. Parameter-Efficient Fine-Tuning (PEFT) methods address this challenge by reducing the number of trainable parameters while maintaining strong downstream performance. Despite the advances in PEFT methods, current evaluations remain limited (in terms of evaluated models and datasets) and difficult to reproduce. To bridge this gap, we introduce PEFT-Bench, a unified end-to-end benchmark for evaluating diverse PEFT methods on autoregressive LLMs. We demonstrate its usage across 27 NLP datasets and 7 PEFT methods. To account for different PEFT training and inference factors, we also introduce the PEFT Soft Cost Penalties (PSCP) metric, which takes trainable parameters, inference speed, and training memory usage into account.

2511.17031 2026-05-14 cs.LG cs.CV cs.CY

Energy Scaling Laws for Diffusion Models: Quantifying Compute in Image Generation

Aniketh Iyengar, Jiaqi Han, Boris Ruf, Vincent Grari, Marcin Detyniecki, Stefano Ermon

发表机构 * Stanford University(斯坦福大学) AXA AI Research(AXA人工智能研究)

AI总结 本文研究了扩散模型在图像生成中的能耗扩展规律,旨在量化不同模型配置和硬件环境下的计算能耗。作者将Kaplan扩展定律应用于扩散模型,基于计算复杂度(FLOPs)预测GPU能耗,并通过实验验证了去噪过程是能耗的主要来源。研究在多种先进扩散模型和GPU架构上进行了广泛测试,证明了该方法在单一架构内具有高预测精度,并具备良好的跨架构泛化能力,为可持续AI部署提供了重要的能耗评估基础。

Comments Accepted at ACM Conference on Fairness, Accountability, and Transparency (FAccT) 2026

详情
英文摘要

The rapidly growing computational demands of diffusion models for image generation have raised significant concerns about energy consumption and environmental impact. While existing approaches to energy optimization focus on architectural improvements or hardware acceleration, there is a lack of principled methods to predict energy consumption across different model configurations and hardware setups. We propose an adaptation of Kaplan scaling laws to predict GPU energy consumption for diffusion models based on computational complexity (FLOPs). Our approach decomposes diffusion model inference into text encoding, iterative denoising, and decoding components, with the hypothesis that denoising operations dominate energy consumption due to their repeated execution across multiple inference steps. We conduct comprehensive experiments across four state-of-the-art diffusion models (Stable Diffusion 2, Stable Diffusion 3.5, Flux, and Qwen) on three GPU architectures (NVIDIA A100, A4000, A6000), spanning various inference configurations including resolution ($256^2$--$1024^2$), precision (fp16/fp32), step counts (10--50), and classifier-free guidance settings. Our energy scaling law achieves high predictive accuracy within individual architectures ($R^2 > 0.9$) and exhibits strong cross-architecture generalization, maintaining high rank correlations across models and enabling reliable energy estimation for unseen model--hardware combinations. These results validate the compute-bound nature of diffusion inference and establish energy consumption estimation as a necessary foundation for sustainable AI deployment planning and subsequent carbon footprint assessment.

2511.16868 2026-05-14 cs.CV q-bio.BM

The Joint Gromov Wasserstein Objective for Multiple Object Matching

Aryan Tajmir Riahi, Khanh Dao Duc

发表机构 * Department of Computer Science, University of British Columbia(不列颠哥伦比亚大学计算机科学系) Department of Mathematics, University of British Columbia(不列颠哥伦比亚大学数学系)

AI总结 本文提出了一种联合格罗莫夫-沃尔夫(JGW)目标函数,旨在解决多个对象之间的匹配问题,突破了传统格罗莫夫-沃尔夫距离仅适用于单对对象匹配的限制。该方法通过扩展原始框架,实现了多个对象集合的同时匹配,并提供了一种具有点采样收敛性的非负相似性度量。实验表明,该方法在准确性和计算效率上优于其他变体,在合成数据和真实数据集上的测试显示其在几何形状和生物分子复合物等多对象匹配任务中表现优异,具有广泛的应用前景。

详情
英文摘要

The Gromov-Wasserstein (GW) distance serves as a powerful tool for matching objects in metric spaces. However, its traditional formulation is constrained to pairwise matching between single objects, limiting its utility in scenarios and applications requiring multiple-to-one or multiple-to-multiple object matching. In this paper, we introduce the Joint Gromov-Wasserstein (JGW) objective and extend the original framework of GW to enable simultaneous matching between collections of objects. Our formulation provides a non-negative dissimilarity measure that identifies partially isomorphic distributions of mm-spaces, with point sampling convergence. We also show that the objective can be formulated and solved for point cloud representations by adapting traditional algorithms in Optimal Transport, including entropic regularization. Our benchmarking with other variants of GW for partial matching indicates superior performance in accuracy and computational efficiency of our method, while experiments on both synthetic and real-world datasets show its effectiveness for multiple shape matching, including geometric shapes and biomolecular complexes, suggesting promising applications for solving complex matching problems across diverse domains, including computer graphics and atomic model building for structural biology.

2511.15743 2026-05-14 cs.LG astro-ph.EP astro-ph.IM

Connecting the Dots: A Machine Learning Ready Dataset for Ionospheric Forecasting Models

Linnea M. Wolniewicz, Halil S. Kelebek, Simone Mestici, Michael D. Vergalla, Giacomo Acciarini, Bala Poduval, Olga Verkhoglyadova, Madhulika Guhathakurta, Thomas E. Berger, Atılım Güneş Baydin, Frank Soboczenski

发表机构 * Department of Information and Computer Science(信息与计算机科学系) University of Hawai‘i at Mānoa(夏威夷大学毛纳罗亚分校) Department of Engineering Science(工程科学系) University of Oxford(牛津大学) Università degli Studi di Roma Sapienza(罗马大学) Free Flight Research Lab(自由飞行研究实验室) University of New Hampshire(新罕布什尔大学) European Space Agency (ESA)(欧洲航天局) NASA Jet Propulsion Laboratory(美国宇航局喷气推进实验室) NASA Headquarters(美国宇航局总部) Space Weather Technology, Research, and Education Center(空间天气技术、研究与教育中心) University of Colorado Boulder(科罗拉多大学博尔德分校) Department of Computer Science(计算机科学系) University of York & King’s College London(约克大学及伦敦国王学院)

AI总结 本文提出了一种用于电离层预报的机器学习 ready 数据集,旨在解决当前电离层预报中观测数据稀疏、空间层耦合复杂以及对及时准确预测需求日益增长的问题。该数据集整合了多种电离层和日球层观测数据,包括太阳动力学观测台数据、太阳风参数、地磁活动指数以及全球导航卫星系统和智能手机的总电子含量数据,并将其统一为时空对齐的模块化结构。该数据集为构建新一代电离层预报模型提供了基础,支持物理模型与数据驱动模型的联合研究,并为探索太阳-地球相互作用提供了丰富的数据资源。

Comments 8 pages, 2 figures, 2 tables. Accepted as a poster presentation in the Machine Learning for the Physical Sciences workshop at NeurIPS 2025. Dataset can be found on Zenodo (https://zenodo.org/records/18343833) or GitHub (https://github.com/FrontierDevelopmentLab/2025-HL-Ionosphere-dataset)

详情
英文摘要

Operational forecasting of the ionosphere remains a critical space weather challenge due to sparse observations, complex coupling across geospatial layers, and a growing need for timely, accurate predictions that support Global Navigation Satellite System (GNSS), communications, aviation safety, as well as satellite operations. As part of the 2025 NASA Heliolab, we present a curated, open-access dataset that integrates diverse ionospheric and heliospheric measurements into a coherent, machine learning-ready structure, designed specifically to support next-generation forecasting models and address gaps in current operational frameworks. Our workflow integrates a large selection of data sources comprising Solar Dynamic Observatory data, solar irradiance indices (F10.7), solar wind parameters (velocity and interplanetary magnetic field), geomagnetic activity indices (Kp, AE, SYM-H), and NASA JPL's Global Ionospheric Maps of Total Electron Content (GIM-TEC). We also implement geospatially sparse data such as the TEC derived from the World-Wide GNSS Receiver Network and crowdsourced Android smartphone measurements. This novel heterogeneous dataset is temporally and spatially aligned into a single, modular data structure that supports both physical and data-driven modeling. Leveraging this dataset, we train and benchmark several spatiotemporal machine learning architectures for forecasting vertical TEC under both quiet and geomagnetically active conditions. This work presents an extensive dataset and modeling pipeline that enables exploration of not only ionospheric dynamics but also broader Sun-Earth interactions, supporting both scientific inquiry and operational forecasting efforts.

2511.14056 2026-05-14 cs.LG cs.AI cs.IT math.DG math.IT stat.ML

Radial Compensation: Fixing Radius Distortion in Chart-Based Generative Models on Riemannian Manifolds

Marios Papamichalis, Regina Ruane

发表机构 * Human Nature Lab, Yale University(耶鲁大学人类自然实验室) Department of Statistics and Data Science, The Wharton School, University of Pennsylvania(宾夕法尼亚大学统计与数据科学系、沃顿商学院)

AI总结 本文研究了基于坐标图的黎曼流形生成模型中的基础分布问题。传统方法在欧几里得切空间中采样后再映射到流形,但这种方法会导致测地距离的扭曲,不同坐标图、曲率和维度下相同切空间尺度可能对应不同的测地半径。为此,作者提出了一种称为径向补偿(Radial Compensation, RC)的方法,通过特定设计的基础分布使模型实现用户指定的测地半径分布,并提升了训练稳定性与曲率估计的清晰度。此外,文中还引入了平衡指数坐标图,进一步优化了模型的数值条件,使得统计意义与数值计算解耦,提高了模型的可解释性与实用性。

详情
英文摘要

We study the base distribution in chart-based generative models on Riemannian manifolds. Standard methods sample in Euclidean tangent space and then map the sample to the manifold with a chart. This is convenient, but it changes the meaning of distance: the same tangent-space scale can correspond to different geodesic radii, i.e. shortest-path distances from a reference point on the manifold, under different charts, curvatures, and dimensions. Within isotropic, scalar-Jacobian azimuthal charts, we show that no base distribution can simultaneously preserve geodesic-radial likelihoods, chart-invariant radial Fisher information, and tangent-space isotropy unless it has a specific form, which we call Radial Compensation (RC). RC chooses the tangent-space base so that the model realizes a user-specified one-dimensional law for the geodesic radius, and leaves the chart available as a numerical preconditioner. This gives more stable training and cleaner curvature estimates, because curvature no longer has to compensate for distortions introduced by the chart. We also introduce balanced exponential charts, which improve conditioning without changing the realized manifold density under RC. This decouples the statistical meaning of the model, the law of the geodesic radius, from its numerical conditioning, which is governed by the chart Jacobian: chart choice becomes a numerical preconditioner rather than a hidden modeling decision. Across manifold variational autoencoders and continuous normalizing flows, RC matches the intended radius behavior, improves numerical stability, and makes learned curvature easier to interpret.

2511.13658 2026-05-14 cs.CL cs.LG

Why is "Chicago" Predictive of Deceptive Reviews? Using LLMs to Discover Language Phenomena from Lexical Cues

Jiaming Qu, Mengtian Guo, Yue Wang

发表机构 * Amazon(亚马逊) UNC Chapel Hill(北卡罗来纳大学教堂山分校)

AI总结 本文研究了为何像“Chicago”这样的词汇能预测评论的欺骗性,并探索如何利用大语言模型(LLMs)将难以理解的词汇线索转化为人类可解释的语言现象。作者提出了一种“假设—验证”框架,通过该框架发现的语言现象具有数据支持、跨领域可推广且预测能力更强。这一方法有助于在缺乏欺骗检测模型的场景下,提升人们对在线评论可信度的判断能力。

详情
英文摘要

Deceptive reviews mislead consumers, harm businesses, and undermine trust in online marketplaces. Machine learning classifiers can learn from large amounts of data to distinguish deceptive reviews from genuine ones. However, the distinguishing features learned by these classifiers are often subtle, fragmented, and difficult for humans to interpret, which can hinder user understanding and trust. In this work, we study whether large language models (LLMs) can translate such unintuitive lexical cues into human-understandable language phenomena. We propose a conjecture-then-validate framework, and show that language phenomena obtained in this manner are empirically grounded in data, generalizable across similar domains, and more predictive than phenomena derived from LLMs' prior knowledge or in-context learning. Such phenomena can aid people in critically assessing the credibility of online reviews in environments where deception detection classifiers are unavailable.

2510.21060 2026-05-14 cs.LG cs.AI

On the Sample Complexity of Differentially Private Policy Optimization

Yi He, Xingyu Zhou

发表机构 * Wayne State University(韦恩州立大学)

AI总结 本文研究了差分隐私策略优化(DP-PO)的样本复杂度,探讨了在隐私保护约束下强化学习策略优化的理论基础。作者提出了适用于策略优化的差分隐私定义,分析了常用策略优化算法(如策略梯度、自然策略梯度等)在隐私约束下的样本复杂度,并指出隐私成本在多数情况下表现为样本复杂度的低阶项。研究为设计隐私保护的策略优化算法提供了重要的理论指导。

Comments Accepted at NeurIPS 2025

详情
英文摘要

Policy optimization (PO) is a cornerstone of modern reinforcement learning (RL), with diverse applications spanning robotics, healthcare, and large language model training. The increasing deployment of PO in sensitive domains, however, raises significant privacy concerns. In this paper, we initiate a theoretical study of differentially private policy optimization, focusing explicitly on its sample complexity. We first formalize an appropriate definition of differential privacy (DP) tailored to PO, addressing the inherent challenges arising from on-policy learning dynamics and the subtlety involved in defining the unit of privacy. We then systematically analyze the sample complexity of widely-used PO algorithms, including policy gradient (PG), natural policy gradient (NPG) and more, under DP constraints and various settings, via a unified framework. Our theoretical results demonstrate that privacy costs can often manifest as lower-order terms in the sample complexity, while also highlighting subtle yet important observations in private PO settings. These offer valuable practical insights for privacy-preserving PO algorithms.

2510.19471 2026-05-14 cs.CL cs.LG eess.AS

Re-evaluating Minimum Bayes Risk Decoding for Automatic Speech Recognition

Yuu Jinnai

发表机构 * CyberAgent

AI总结 本文重新评估了最小贝叶斯风险(MBR)解码在自动语音识别(ASR)和语音翻译(ST)任务中的表现。研究采用Whisper及其衍生模型,在英语和日语数据上进行实验,发现MBR解码在多数情况下优于传统的束搜索方法,显示出在需要高精度的离线ASR和ST任务中具有应用潜力。该研究为语音到文本任务中的解码策略提供了新的视角和方法支持。

详情
英文摘要

Recent work has shown that sample-based Minimum Bayes Risk (MBR) decoding outperforms beam search in text-to-text generation tasks, such as machine translation, text summarization, and image captioning. On the other hand, beam search is the current practice for speech-to-text tasks such as automatic speech recognition (ASR) and Speech Translation (ST). Given that MBR decoding is effective in text-to-text generation tasks, it is reasonable to expect it to also be effective for speech-to-text tasks. In this paper, we evaluate MBR decoding for ASR and ST tasks on English and Japanese using Whisper and its derivative models. We observe that the accuracy of MBR decoding outperforms that of beam search in most of the experimental settings we have evaluated. The results show that MBR decoding is a promising method for offline ASR and ST tasks that require high accuracy. The code is available at https://github.com/CyberAgentAILab/mbr-for-asr

2510.19304 2026-05-14 cs.LG

Loopholing Discrete Diffusion: Deterministic Bypass of the Sampling Wall

Mingyu Jo, Jaesik Yoon, Justin Deschenaux, Caglar Gulcehre, Sungjin Ahn

发表机构 * KAIST(韩国科学技术院) EPFL(苏黎世联邦理工学院) Microsoft(微软) SAP NYU(纽约大学)

AI总结 离散扩散模型通过并行解码提供了自回归生成的替代方案,但其面临“采样墙”问题:一旦进行分类采样,丰富的分布信息会坍缩为独热向量,无法在后续步骤中传递,导致生成质量受限。本文提出了一种名为Loopholing的新机制,通过确定性的潜在路径保留这些信息,从而构建出Loopholing离散扩散模型(LDDMs)。该方法在生成困惑度、文本连贯性以及推理任务表现上均显著优于现有基线,为高质量非自回归文本生成提供了有效途径。

Comments Accepted at ICLR 2026

详情
英文摘要

Discrete diffusion models offer a promising alternative to autoregressive generation through parallel decoding, but they suffer from a sampling wall: once categorical sampling occurs, rich distributional information collapses into one-hot vectors and cannot be propagated across steps, forcing subsequent steps to operate with limited information. To mitigate this problem, we introduce Loopholing, a novel and simple mechanism that preserves this information via a deterministic latent pathway, leading to Loopholing Discrete Diffusion Models (LDDMs). Trained efficiently with a self-conditioning strategy that avoids unrolling the full denoising trajectory, LDDMs achieve substantial gains-reducing generative perplexity by up to 61% over prior baselines, thereby closing (and in some cases surpassing) the gap with autoregressive models, and producing more coherent text. Applied to reasoning tasks, LDDMs also improve performance on arithmetic benchmarks such as Countdown and Game of 24. These results also indicate that loopholing mitigates idle steps and oscillations, providing a general and effective path toward high-quality non-autoregressive text generation.

2510.18245 2026-05-14 cs.LG cs.AI

Scaling Laws Meet Model Architecture: Toward Inference-Efficient LLMs

Song Bian, Tao Yu, Shivaram Venkataraman, Youngsuk Park

发表机构 * UW-Madison(威斯康星大学麦迪逊分校) Amazon Web Services(亚马逊网络服务)

AI总结 本文研究了大语言模型(LLM)中模型架构对推理效率与准确率的影响,探讨了隐藏层大小、参数在MLP与注意力模块间的分配比例以及分组查询注意力(GQA)等关键因素。作者提出了一种结合架构信息的条件缩放定律,并设计了一种搜索框架,用于寻找同时具备高准确率与高推理效率的模型架构。实验表明,该方法在相同训练预算下,相比现有开源基线模型,实现了更高的准确率和推理吞吐量。

Comments 32 pages, 27 figures

Journal ref ICLR 2026

详情
英文摘要

Scaling the number of parameters and the size of training data has proven to be an effective strategy for improving large language model (LLM) performance. Yet, as these models grow increasingly powerful and widely deployed, the cost of inference has become a pressing concern. Despite its importance, the trade-off between model accuracy and inference efficiency remains underexplored. In this work, we examine how key architectural factors, hidden size, the allocation of parameters between MLP and attention (mlp-to-attention ratio), and grouped-query attention (GQA), influence both inference cost and accuracy. We introduce a conditional scaling law that augments the Chinchilla framework with architectural information, along with a search framework for identifying architectures that are simultaneously inference-efficient and accurate. To validate our approach, we train more than 200 models spanning 80M to 3B parameters and 8B to 100B training tokens, and fit the proposed conditional scaling law. Our results show that the conditional scaling law reliably predicts optimal architectural choices and that the resulting models outperform existing open-source baselines. Under the same training budget, optimized architectures achieve up to 2.1% higher accuracy and 42% greater inference throughput compared to LLaMA-3.2.

2510.18114 2026-05-14 cs.LG cs.AI stat.ML

Latent-Augmented Discrete Diffusion Models

Dario Shariatian, Alain Durmus, Umut Simsekli, Stefano Peluchetti

发表机构 * Inria(法国国家信息与自动化技术研究院) PSL Research University(巴黎社会科学高等研究院) CMAP(巴黎高等理工学院应用数学与计算科学实验室) Ecole Polytechnique Palaiseau, France(法国巴黎高等理工学院Palaiseau分校) Sakana AI Tokyo, Japan(日本东京Sakana AI公司)

AI总结 离散扩散模型在语言生成任务中展现出强大潜力,但现有方法常因忽略跨词依赖而影响生成效率。本文提出了一种名为Latent-Augmented Discrete Diffusion (LADD) 的新模型,通过引入可学习的辅助潜在变量,在联合的(词,潜在)空间中进行扩散,从而更好地捕捉结构信息并保持参数可学习性。实验表明,LADD在无条件生成任务中优于现有最优方法,尤其在低采样预算下表现更优。

详情
英文摘要

Discrete diffusion models have emerged as a powerful class of models and a promising route to fast language generation, but practical implementations typically rely on factored reverse transitions ignoring cross-token dependencies and degrading few-step performance. We propose Latent-Augmented Discrete Diffusion (LADD), which introduces a learnable auxiliary latent channel and performs diffusion over the joint (token, latent) space. The latent variables provide an intermediate representation expressing joint structure while preserving tractable parameterizations. We instantiate LADD with continuous latents (Co-LADD) and discrete latents (Di-LADD), and study two inference schedules: a joint diffusion that denoises data and latents together, and a sequential diffusion that first resolves latents and then samples tokens conditionally. We derive ELBO-style objectives and analyze design choices that balance latent expressivity with diffusion compatibility. In experiments, LADD models yield improvements on unconditional generation metrics as compared to state-of-the-art masked discrete diffusion baselines, and are effective at lower sampling budgets, where unmasking many tokens per step is desirable.

2510.16253 2026-05-14 cs.LG cs.AI q-bio.BM q-bio.QM stat.ML

Protein Folding with Neural Ordinary Differential Equations

Arielle Sanford, Shuo Sun, Christian B. Mendl

发表机构 * University of Chicago, Department of Computer Science(芝加哥大学计算机科学系) Technical University of Munich, School of CIT, Department of Computer Science(慕尼黑技术大学 CIT 学院计算机科学系) TUM Institute for Advanced Study, Lichtenbergstraße 2a(慕尼黑技术大学高级研究学院)

AI总结 本文提出了一种基于神经常微分方程(Neural ODE)的连续深度Evoformer模型,用于蛋白质折叠预测。该方法将传统Evoformer中48个离散块替换为连续时间参数化模块,从而在保持核心注意力机制的同时,显著降低了计算资源消耗。实验表明,该模型在较少计算资源下仍能生成结构合理的预测结果,并有效捕捉部分二级结构特征,展示了连续深度模型在生物分子建模中的潜力。

Journal ref Mach. Learn.: Sci. Technol. 7, 035008 (2026)

详情
英文摘要

Recent advances in protein structure prediction, such as AlphaFold, have demonstrated the power of deep neural architectures like the Evoformer for capturing complex spatial and evolutionary constraints on protein conformation. However, the depth of the Evoformer, comprising 48 stacked blocks, introduces high computational costs and rigid layerwise discretization. Inspired by Neural Ordinary Differential Equations (Neural ODEs), we propose a continuous-depth formulation of the Evoformer, replacing its 48 discrete blocks with a Neural ODE parameterization that preserves its core attention-based operations. This continuous-time Evoformer achieves constant memory cost (in depth) via the adjoint method, while allowing a principled trade-off between runtime and accuracy through adaptive ODE solvers. Benchmarking on protein structure prediction tasks, we find that the Neural ODE-based Evoformer produces structurally plausible predictions and reliably captures certain secondary structure elements, such as alpha-helices, though it does not fully replicate the accuracy of the original architecture. However, our model achieves this performance using dramatically fewer resources, just 17.5 hours of training on a single GPU, highlighting the promise of continuous-depth models as a lightweight and interpretable alternative for biomolecular modeling. This work opens new directions for efficient and adaptive protein structure prediction frameworks.

2510.13999 2026-05-14 cs.LG cs.AI

REAP the Experts: Why Pruning Prevails for One-Shot MoE compression

Mike Lasby, Ivan Lazarevich, Nish Sinnadurai, Sean Lie, Yani Ioannou, Vithursan Thangarasa

发表机构 * Cerebras Systems Inc.(Cerebras系统公司) Schulich School of Engineering, University of Calgary(卡莱尔大学施密特工程学院)

AI总结 本文研究了在生成任务中对稀疏激活的专家混合(SMoE)模型进行专家压缩的有效方法,发现与近期在判别任务中表现较好的专家合并方法不同,专家剪枝在生成任务中更具优势。作者提出了一种基于路由门值和专家激活范数的剪枝准则——REAP,能够有效降低重建误差,实验表明该方法在多个大规模SMoE模型上,特别是在50%压缩率下,显著优于现有方法,并在代码生成任务中实现了接近无损的压缩效果。

Comments 30 pages, 9 figures, 12 tables

详情
英文摘要

Sparsely-activated Mixture-of-Experts (SMoE) models offer efficient pre-training and low latency but their large parameter counts create significant memory overhead, motivating research into expert compression. Contrary to recent findings favouring expert merging on discriminative benchmarks, we find that expert pruning is a superior strategy for generative tasks. We demonstrate that existing merging techniques introduce an irreducible error due to the loss of fine-grained routing control over experts. Leveraging this insight, we propose Router-weighted Expert Activation Pruning (REAP), a novel pruning criterion that considers both router gate-values and expert activation norms to minimize the reconstruction error bound. Across a diverse set of SMoE models ranging from 20B to 1T parameters, REAP consistently outperforms merging and other pruning methods on generative benchmarks, especially at 50% compression. Notably, our method achieves near-lossless compression on code generation tasks with Qwen3-Coder-480B and Kimi-K2, even after pruning 50% of experts.

2510.11303 2026-05-14 cs.CV

sketch2symm: Symmetry-aware sketch-to-shape generation via semantic bridging

Yan Zhou, Mingji Li, Xiantao Zeng, Jie Lin, Yuexia Zhou

发表机构 * School of Electronic Information Engineering, Foshan University, Guangdong, China(佛山大学电子信息工程学院) School of Computer Science and Artificial Intelligence, Foshan University, Guangdong, China(佛山大学计算机科学与人工智能学院)

AI总结 Sketch2Symm 是一种基于语义桥接和对称约束的两阶段草图到三维形状生成方法,旨在解决草图输入抽象且信息稀疏带来的三维重建难题。该方法通过草图到图像的翻译增强草图的语义表示,并引入对称性先验以利用日常物体的结构规律,从而生成几何一致的三维形状。实验表明,该方法在主流草图数据集上优于现有方法,验证了其在生成质量上的有效性。

详情
英文摘要

Sketch-based 3D reconstruction remains a challenging task due to the abstract and sparse nature of sketch inputs, which often lack sufficient semantic and geometric information. To address this, we propose Sketch2Symm, a two-stage generation method that produces geometrically consistent 3D shapes from sketches. Our approach introduces semantic bridging via sketch-to-image translation to enrich sparse sketch representations, and incorporates symmetry constraints as geometric priors to leverage the structural regularity commonly found in everyday objects. Experiments on mainstream sketch datasets demonstrate that our method achieves superior performance compared to existing sketch-based reconstruction methods in terms of Chamfer Distance, Earth Mover's Distance, and F-Score, verifying the effectiveness of the proposed semantic bridging and symmetry-aware design.