arXivDaily arXiv每日学术速递 周一至周五更新
重置
2512.22287 2026-06-12 cs.LG cs.AI 版本更新

Cluster Aggregated GAN (CAG): A Cluster-Based Hybrid Model for Appliance Pattern Generation

聚类聚合生成对抗网络 (CAG):一种基于聚类的混合模型用于电器模式生成

Zikun Guo, Adeyinka.P. Adedigba, Rammohan Mallipeddi

发表机构 * Department of Artificial Intelligence, School of Electronics Engineering, Kyungpook National University(人工智能系,电子工程学院,全北国立大学)

AI总结 针对现有生成方法忽略间歇性与连续电器行为差异导致训练不稳定和保真度有限的问题,提出CAG框架,通过聚类模块为间歇电器分配专用生成器,连续电器使用LSTM生成器,在UVIC数据集上优于基线方法。

详情
Comments
18pages, 5Figues
AI中文摘要

合成电器数据对于开发非侵入式负荷监测算法和实现隐私保护的能源研究至关重要,然而标记数据集的稀缺性仍然是一个重大障碍。最近基于GAN的方法已经证明了合成负荷模式的可行性,但大多数现有方法在单个模型内统一处理所有设备,忽略了间歇性和连续性电器之间的行为差异,导致训练不稳定和输出保真度有限。为了解决这些局限性,我们提出了聚类聚合生成对抗网络框架,这是一种混合生成方法,根据每个电器的行为特征将其路由到专门的分支。对于间歇性电器,聚类模块将相似的激活模式分组,并为每个聚类分配专用生成器,确保常见和罕见操作模式都获得足够的建模能力。连续性电器遵循单独的分支,采用基于LSTM的生成器来捕捉逐渐的时间演变,同时通过序列压缩保持训练稳定性。在UVIC智能插头数据集上的大量实验表明,所提出的框架在衡量真实性、多样性和训练稳定性的指标上始终优于基线方法,并且将聚类作为主动生成组件显著提高了可解释性和可扩展性。这些发现确立了所提出的框架作为非侵入式负荷监测研究中合成负荷生成的有效方法。

英文摘要

Synthetic appliance data are essential for developing non-intrusive load monitoring algorithms and enabling privacy preserving energy research, yet the scarcity of labeled datasets remains a significant barrier. Recent GAN-based methods have demonstrated the feasibility of synthesizing load patterns, but most existing approaches treat all devices uniformly within a single model, neglecting the behavioral differences between intermittent and continuous appliances and resulting in unstable training and limited output fidelity. To address these limitations, we propose the Cluster Aggregated GAN framework, a hybrid generative approach that routes each appliance to a specialized branch based on its behavioral characteristics. For intermittent appliances, a clustering module groups similar activation patterns and allocates dedicated generators for each cluster, ensuring that both common and rare operational modes receive adequate modeling capacity. Continuous appliances follow a separate branch that employs an LSTM-based generator to capture gradual temporal evolution while maintaining training stability through sequence compression. Extensive experiments on the UVIC smart plug dataset demonstrate that the proposed framework consistently outperforms baseline methods across metrics measuring realism, diversity, and training stability, and that integrating clustering as an active generative component substantially improves both interpretability and scalability. These findings establish the proposed framework as an effective approach for synthetic load generation in non-intrusive load monitoring research.

2601.17654 2026-06-12 cs.LG cs.DC 版本更新

Kareus: Joint Reduction of Dynamic and Static Energy in Large Model Training

Kareus:大型模型训练中动态与静态能量的联合降低

Ruofan Wu, Jae-Won Chung, Mosharaf Chowdhury

发表机构 * University of Michigan(密歇根大学)

AI总结 针对AI训练能耗高昂问题,提出Kareus系统,通过联合优化细粒度内核调度与频率缩放,协同降低动态和静态能耗,在相同训练时间下节能28.3%,或相同能耗下提速27.5%。

详情
Comments
OSDI '26 | Open-source at this https URL
AI中文摘要

AI的计算需求正以前所未有的速度增长,但能源供应并未跟上步伐。因此,能源已成为一种昂贵且受争抢的资源,需要明确的管理和优化。尽管近期工作在大型模型训练优化方面取得了显著进展,但它们侧重于优化动态或静态能耗中的一种。我们发现,细粒度的内核调度和频率缩放共同且相互依赖地影响动态和静态能耗。基于这一发现,我们设计了Kareus,一个通过优化两方面来推动时间-能耗权衡前沿的训练系统。Kareus将棘手的联合优化问题分解为基于分区的局部子问题,然后使用多遍多目标优化算法来找到推动时间-能耗权衡前沿的执行调度。与现有技术相比,Kareus在相同训练时间下最多可减少28.3%的训练能耗,或在相同能耗下最多减少27.5%的训练时间。

英文摘要

The computing demand of AI is growing at an unprecedented rate, but energy supply is not keeping pace. As a result, energy has become an expensive and contended resource that requires explicit management and optimization. Although recent works have made significant progress in large model training optimization, they focus on optimizing either dynamic or static energy consumption. We find that fine-grained kernel scheduling and frequency scaling jointly and interdependently impact both dynamic and static energy consumption. Based on this finding, we design Kareus, a training system that pushes the time-energy tradeoff frontier by optimizing both aspects. Kareus decomposes the intractable joint optimization problem into local, partition-based subproblems. It then uses a multi-pass multi-objective optimization algorithm to find execution schedules that push the time-energy tradeoff frontier. Compared to the state of the art, Kareus reduces training energy by up to 28.3% at the same training time, or reduces training time by up to 27.5% at the same energy consumption.

2509.03340 2026-06-12 cs.LG cs.AI cs.CE physics.comp-ph 版本更新

Equivariant Flow Matching for Symmetry-Breaking Bifurcation Problems

等变流匹配用于对称破缺分岔问题

Fleur Hendriks, Ondřej Rokoš, Martin Doškář, Marc G.D. Geers, Vlado Menkovski

发表机构 * Department of Mechanical Engineering, Eindhoven University of Technology(埃因霍温理工大学机械工程系) DIFFER – Dutch Institute for Fundamental Energy Research(荷兰基础能源研究所) Faculty of Civil Engineering, Department of Mechanics, Czech Technical University in Prague(布拉格捷克技术大学土木工程学院力学系) Department of Mathematics and Computer Science, Eindhoven University of Technology(埃因霍温理工大学数学与计算机科学系)

AI总结 针对非线性动力系统中对称破缺导致的多稳态共存问题,提出等变流匹配方法,结合等变架构与最优传输耦合机制,准确捕捉多模态分布和对称破缺分岔,优于非概率和变分方法。

详情
Comments
9 pages, 7 figures including appendices. Accepted to Machine Learning and the Physical Sciences Workshop, NeurIPS 2025 ( this https URL ). Repository with corresponding code: this https URL. Video explanation: this https URL
AI中文摘要

非线性动力系统中的分岔现象通常导致多个共存的稳定解,特别是在对称破缺的情况下。确定性机器学习模型无法捕捉这种多重性,会平均化解并无法表示低对称性结果。在这项工作中,我们正式将生成式AI(特别是流匹配)作为建模分岔结果全概率分布的原则性方法。我们的方法建立在现有技术基础上,将流匹配与等变架构和基于最优传输的耦合机制相结合。我们将等变流匹配推广到一种对称耦合策略,该策略在群作用下对齐预测和目标输出,从而在等变设置中实现准确学习。我们在从简单概念系统到物理问题(如屈曲梁和Allen-Cahn方程)的一系列系统上验证了我们的方法。结果表明,该方法准确捕捉了多模态分布和对称破缺分岔。此外,我们的结果表明,流匹配显著优于非概率和变分方法。这为高维系统中的多稳态建模提供了一种原则性且可扩展的解决方案。

英文摘要

Bifurcation phenomena in nonlinear dynamical systems often lead to multiple coexisting stable solutions, particularly in the presence of symmetry breaking. Deterministic machine learning models are unable to capture this multiplicity, averaging over solutions and failing to represent lower-symmetry outcomes. In this work, we formalize the use of generative AI, specifically flow matching, as a principled way to model the full probability distribution over bifurcation outcomes. Our approach builds on existing techniques by combining flow matching with equivariant architectures and an optimal-transport-based coupling mechanism. We generalize equivariant flow matching to a symmetric coupling strategy that aligns predicted and target outputs under group actions, allowing accurate learning in equivariant settings. We validate our approach on a range of systems, from simple conceptual systems to physical problems such as buckling beams and the Allen--Cahn equation. The results demonstrate that the approach accurately captures multimodal distributions and symmetry-breaking bifurcations. Moreover, our results demonstrate that flow matching significantly outperforms non-probabilistic and variational methods. This offers a principled and scalable solution for modeling multistability in high-dimensional systems.

2507.02921 2026-06-12 cs.LG cs.AI 版本更新

PlaceRep: Geospatial Place Representation Learning from Large-Scale Point-of-Interest Data

PlaceRep: 基于大规模兴趣点数据的地理空间场所表示学习

Mohammad Hashemi, Hossein Amiri, Andreas Zufle

发表机构 * Emory University(埃默里大学)

AI总结 提出PlaceRep方法,通过聚类空间和语义相关的兴趣点构建场所级表示,无需预训练即可高效生成多尺度城市区域嵌入,在人口密度估计和房价预测任务中优于现有方法并实现百倍加速。

详情
AI中文摘要

学习城市环境的有效表示需要捕捉超越固定行政边界的空间结构。现有的地理空间表示学习方法通常将兴趣点(POI)聚合到预定义的行政区域(如普查单元或邮政编码区域),为每个区域分配单个嵌入。然而,POI 通常形成跨越、包含或超出这些边界的语义上有意义的组,定义了更能反映人类活动和城市功能的场所。为解决这一局限性,我们提出 PlaceRep,一种通过聚类空间和语义相关的 POI 来构建场所级表示的地理空间表示学习方法。PlaceRep 从美国 Foursquare 数据中总结大规模 POI 图,生成通用城市区域嵌入,同时自动识别跨多个空间尺度的场所。通过消除模型预训练,PlaceRep 为多粒度地理空间分析提供了可扩展且高效的解决方案。使用人口密度估计和房价预测作为下游任务的实验表明,PlaceRep 优于大多数最先进的基于图的地理空间表示学习方法,并在大规模 POI 图上生成区域级表示时实现了高达 100 倍的加速。PlaceRep 的实现可在该 https URL 获取。

英文摘要

Learning effective representations of urban environments requires capturing spatial structure beyond fixed administrative boundaries. Existing geospatial representation learning approaches typically aggregate Points of Interest (POIs) into pre-defined administrative regions such as census units or ZIP code areas, assigning a single embedding to each region. However, POIs often form semantically meaningful groups that extend across, within, or beyond these boundaries, defining places that better reflect human activity and urban function. To address this limitation, we propose PlaceRep, a geospatial representation learning method that constructs place-level representations by clustering spatially and semantically related POIs. PlaceRep summarizes large-scale POI graphs from U.S. Foursquare data to produce general-purpose urban region embeddings while automatically identifying places across multiple spatial scales. By eliminating model pre-training, PlaceRep provides a scalable and efficient solution for multi-granular geospatial analysis. Experiments using the tasks of population density estimation and housing price prediction as downstream tasks show that PlaceRep outperforms most state-of-the-art graph-based geospatial representation learning methods and achieves up to a x100 speedup in generating region-level representations on large-scale POI graphs. The implementation of PlaceRep is available at this https URL.

2601.15503 2026-06-12 cs.LG 版本更新

Data-driven Lake Water Quality Forecasting for Time Series with Missing Data using Machine Learning

基于机器学习的数据驱动湖泊水质时间序列缺失数据预测

Rishit Chatterjee, Tahiya Chowdhury

发表机构 * Department of Computer Science, Colby College(科克学院计算机科学系)

AI总结 针对志愿者监测导致的湖泊数据缺失问题,采用多重插补和岭回归,在30个湖泊数据集上实现透明度预测,并量化了最小样本量和特征集,提出联合可行性函数以优化监测策略。

详情
Comments
8 pages, 4 figures, 3 tables
AI中文摘要

志愿者主导的湖泊监测产生不规则、季节性的时间序列,由于冰盖、天气相关的通行限制以及偶尔的人为错误,存在大量缺失数据,这给有害藻华预测和早期预警带来了困难。我们研究了基于来自缅因州湖泊三十年间原位记录的数据丰富子集(30个湖泊)的塞氏盘深度(SDD)预测。通过链式方程多重插补(MICE)处理缺失数据,并使用归一化平均绝对误差(nMAE)指标进行跨湖泊性能比较。在六种候选模型中,岭回归提供了最佳的平均测试性能。利用岭回归,我们量化了最小样本量,表明在向后近期历史协议下,模型平均每个湖泊约176个训练样本即可达到全历史准确率的5%以内。我们还确定了最小特征集,其中紧凑的四特征子集在相同5%容差内匹配了十三特征基线。综合这些结果,我们引入了一个联合可行性函数,该函数识别出达到完整历史、全特征基线5%以内目标所需的最小训练历史和最少预测变量。在我们的研究中,达到5%准确率目标需要每个湖泊约64个近期样本和仅一个预测变量,凸显了针对性监测的实用性。因此,我们的联合可行性策略在固定准确率目标下统一了近期历史长度和特征选择,为湖泊研究人员制定采样工作和测量优先级提供了简单高效的规则。

英文摘要

Volunteer-led lake monitoring yields irregular, seasonal time series with many gaps arising from ice cover, weather-related access constraints, and occasional human errors, complicating forecasting and early warning of harmful algal blooms. We study Secchi Disk Depth (SDD) forecasting on a 30-lake, data-rich subset drawn from three decades of in-situ records collected across Maine lakes. Missingness is handled via Multiple Imputation by Chained Equations (MICE), and we evaluate performance with a normalized Mean Absolute Error (nMAE) metric for cross-lake comparability. Among six candidates, ridge regression provides the best mean test performance. Using ridge regression, we then quantify the minimal sample size, showing that under a backward, recent-history protocol, the model reaches within 5% of full-history accuracy with approximately 176 training samples per lake on average. We also identify a minimal feature set, where a compact four-feature subset matches the thirteen-feature baseline within the same 5% tolerance. Bringing these results together, we introduce a joint feasibility function that identifies the minimal training history and fewest predictors sufficient to achieve the target of staying within 5% of the complete-history, full-feature baseline. In our study, meeting the 5% accuracy target required about 64 recent samples and just one predictor per lake, highlighting the practicality of targeted monitoring. Hence, our joint feasibility strategy unifies recent-history length and feature choice under a fixed accuracy target, yielding a simple, efficient rule for setting sampling effort and measurement priorities for lake researchers.

2601.13591 2026-06-12 cs.AI cs.CL 版本更新

DSAEval: Evaluating Data Science Agents on a Wide Range of Real-World Data Science Problems

DSAEval:在广泛真实世界数据科学问题上评估数据科学智能体

Maojun Sun, Yifei Xie, Yue Wu, Ruijian Han, Binyan Jiang, Defeng Sun, Yancheng Yuan, Jian Huang

发表机构 * Department of Data Science and Artificial Intelligence, Hong Kong Polytechnic University(数据科学与人工智能系,香港理工大学) Department of Applied Mathematics, Hong Kong Polytechnic University(应用数学系,香港理工大学)

AI总结 提出包含641个真实数据科学问题的基准DSAEval,涵盖多模态环境感知、多查询交互和多维评估,系统评估13个先进LLM智能体,发现Claude-Sonnet-4.5综合最优,多模态感知提升视觉任务性能2.04%-11.30%。

详情
AI中文摘要

近期基于LLM的数据智能体旨在自动化从数据分析到深度学习的数据科学任务。然而,真实世界数据科学问题的开放性——通常跨越多个分类且缺乏标准答案——给评估带来了重大挑战。为此,我们引入了DSAEval,一个包含641个基于285个多样化数据集的真实世界数据科学问题的基准,涵盖结构化和非结构化数据(例如图像和文本)。DSAEval包含三个独特特征:(1)多模态环境感知,使智能体能够解释来自多种模态(包括文本和视觉)的观察;(2)多查询交互,反映真实世界数据科学项目的迭代和累积性质;(3)多维评估,提供跨推理、代码和结果的全面评估。我们使用DSAEval系统评估了13个近期先进的智能体LLM。结果表明,Claude-Sonnet-4.5实现了最强的整体性能,MiMo-V2-Pro在持续时间上领先,GPT-5.2在步骤效率上领先,而MiMo-V2-Flash最具成本效益。我们进一步证明,多模态感知持续提升视觉相关任务的性能,增益范围为2.04%至11.30%。总体而言,尽管当前数据科学智能体在结构化数据和常规数据分析工作流上表现良好,但在非结构化领域仍存在重大挑战。最后,我们提供了关键见解并概述了未来研究方向。

英文摘要

Recent LLM-based data agents aim to automate data science tasks ranging from data analysis to deep learning. However, the open-ended nature of real-world data science problems, which often span multiple taxonomies and lack standard answers, poses a significant challenge for evaluation. To address this, we introduce DSAEval, a benchmark comprising 641 real-world data science problems grounded in 285 diverse datasets, covering both structured and unstructured data (e.g., image and text). DSAEval incorporates three distinctive features: (1) Multimodal Environment Perception, which enables agents to interpret observations from multiple modalities, including text and vision; (2) Multi-Query Interactions, which mirror the iterative and cumulative nature of real-world data science projects; and (3) Multi-Dimensional Evaluation, which provides a holistic assessment across reasoning, code, and results. We systematically evaluate 13 recent advanced agentic LLMs using DSAEval. Our results show that Claude-Sonnet-4.5 achieves the strongest overall performance, MiMo-V2-Pro and GPT-5.2 lead in duration and step efficiency, respectively, and MiMo-V2-Flash is the most cost-effective. We further demonstrate that multimodal perception consistently improves performance on vision-related tasks, with gains ranging from 2.04\% to 11.30\%. Overall, while current data science agents perform well on structured data and routine data analysis workflows, substantial challenges remain in unstructured domains. Finally, we offer critical insights and outline future research directions.

2508.12681 2026-06-12 cs.RO cs.LG eess.SY 版本更新

Adaptive Model-Predictive Control of a Soft Continuum Robot Using a Physics-Informed Neural Network Based on Cosserat Rod Theory

基于Cosserat杆理论物理信息神经网络的软体连续机器人自适应模型预测控制

Johann Licher, Max Bartholdt, Henrik Krauss, Tim-Lukas Habich, Thomas Seel, Moritz Schappler

发表机构 * Institute of Mechatronic Systems, Leibniz University Hannover(机械系统研究所,汉诺威莱布尼茨大学) Department of Advanced Interdisciplinary Studies, The University of Tokyo(先进跨学科研究部,东京大学) Institute of Assembly Technology and Robotics, Leibniz University of Hannover(组装技术与机器人研究所,汉诺威莱布尼茨大学)

AI总结 提出一种基于域解耦物理信息神经网络(DD-PINN)的实时非线性模型预测控制框架,实现软体连续机器人的高精度动态控制,位置误差低于3 mm。

详情
Comments
Submitted to IEEE Transactions on Robotics, 20 pages, 14 figures
AI中文摘要

软体连续机器人(SCR)的动态控制对其应用扩展具有巨大潜力,但由于精确动态模型的高计算需求,仍然是一个具有挑战性的问题。虽然已经提出了如Koopman算子方法等数据驱动方法,但它们通常缺乏自适应性,且无法重建完整的机器人形状,限制了其适用性。本文介绍了一种基于具有自适应弯曲刚度的域解耦物理信息神经网络(DD-PINN)的实时非线性模型预测控制(MPC)框架。DD-PINN作为动态Cosserat杆模型的替代模型,加速比高达44,000倍。它还被用于无迹卡尔曼滤波器中,从末端执行器位置测量中估计模型状态和弯曲柔度。我们在GPU上实现了一个以70 Hz运行的非线性进化MPC。在仿真中,它展示了动态轨迹的精确跟踪和设定点控制,末端执行器位置误差低于3 mm(执行器长度的2.3%)。在实际实验中,控制器实现了类似的精度和高达3.55 m/s²的加速度。

英文摘要

Dynamic control of soft continuum robots (SCRs) holds great potential for expanding their applications, but remains a challenging problem due to the high computational demands of accurate dynamic models. While data-driven approaches like Koopman-operator-based methods have been proposed, they typically lack adaptability and cannot reconstruct the full robot shape, limiting their applicability. This work introduces a real-time-capable nonlinear model-predictive control (MPC) framework for SCRs based on a domain-decoupled physics-informed neural network (DD-PINN) with adaptable bending stiffness. The DD-PINN serves as a surrogate for the dynamic Cosserat rod model with a speed-up factor of up to 44,000. It is also used within an unscented Kalman filter for estimating the model states and bending compliance from end-effector position measurements. We implement a nonlinear evolutionary MPC running at 70 Hz on the GPU. In simulation, it demonstrates accurate tracking of dynamic trajectories and setpoint control with end-effector position errors below 3 mm (2.3\% of the actuator's length). In real-world experiments, the controller achieves similar accuracy and accelerations up to 3.55 m/s2.

2509.18085 2026-06-12 cs.LG cs.AI cs.CL 版本更新

Structuring The Future: Diffusion LLM Speculative Decoding via Calibrated Draft Graphs

构建未来:通过校准草稿图实现扩散LLM推测解码

Sudhanshu Agrawal, Risheek Garrepalli, Raghavv Goel, Christopher Lott, Fatih Porikli, Mingu Lee

发表机构 * University of Waterloo(多伦多大学)

AI总结 提出Spiffy算法,利用校准的草稿图结构实现扩散LLM的推测解码,在保持输出分布的同时加速推理,最高减少8.6倍模型推理次数并加速6.3倍令牌生成速率。

详情
Comments
Original version uploaded on Sep 22, 2025. (v2): Extended Table 2 with additional analysis and referenced it in Sec 5.2. (v3): Added note to Sec 4.2 and Appendix A.2 specifying conditions for losslessness. (v4): Updated with the version accepted to ICML 2026 workshops
AI中文摘要

扩散LLM(dLLM)最近作为自回归LLM(AR-LLM)的强大替代方案出现,具有以显著更高的令牌生成速率运行的潜力。为了释放这一潜力,我们提出了Spiffy,一种推测解码算法,用于加速dLLM推理,同时可证明地保持模型的输出分布。这项工作解决了将AR-LLM的推测解码思想应用于dLLM所涉及的独特挑战。Spiffy执行自动推测以消除独立草稿模型的开销,以新颖的有向草稿图形式构建草稿状态,以利用dLLM生成的双向、块状特性。这些草稿图离线校准以最大化接受率,并在推理过程中动态剪枝以提高计算效率。我们给出了Spiffy的详细公式,并展示了其与KV缓存和基于阈值的动态掩码相结合,加速LLaDA、Dream和SDAR模型的能力,导致模型推理次数减少高达8.6倍,令牌速率加速高达6.3倍。

英文摘要

Diffusion LLMs (dLLMs) have recently emerged as a powerful alternative to autoregressive LLMs (AR-LLMs) with the potential to operate at significantly higher token-generation rates. To unlock this potential, we present Spiffy, a speculative decoding algorithm to accelerate dLLM inference while provably preserving the model's output distribution. This work addresses the unique challenges involved in applying ideas from speculative decoding of AR-LLMs to dLLMs. Spiffy performs auto-speculation to eliminate the overheads of an independent draft model, structuring draft states in the form of a novel directed draft graph to take advantage of the bidirectional, blockwise nature of dLLM generation. These draft graphs are calibrated offline to maximize acceptance rates and are dynamically pruned during inference for improved computational efficiency. We present a detailed formulation of Spiffy and demonstrate its ability to accelerate LLaDA, Dream, and SDAR models in combination with KV caching and threshold-based dynamic unmasking leading to up to $8.6\times$ reduction in model inferences and $6.3\times$ acceleration in token rate.

2601.06227 2026-06-12 cs.LG cs.AI 版本更新

When Smaller Wins: Dual-Stage Distillation and Pareto-Guided Compression of Liquid Neural Networks for Edge Battery Prognostics

当更小胜出:面向边缘电池健康预测的液态神经网络双阶段蒸馏与帕累托引导压缩

Dhivya Dharshini Kannan, Wei Li, Wei Zhang, Jianbiao Wang, Zhi Wei Seh, Man-Fai Ng

发表机构 * Singapore Institute of Technology(新加坡科技学院) Institute of Materials Research and Engineering(材料研究与工程研究所) Agency for Science, Technology and Research(科技研究局) Institute of High Performance Computing(高性能计算研究所)

AI总结 提出DLNet框架,通过欧拉离散化、双阶段知识蒸馏和帕累托引导压缩,将高容量液态神经网络压缩为边缘可部署模型,在电池健康预测中实现小模型超越大模型。

详情
Comments
Accepted at International Conference on Pattern Recognition, ICPR 2026. Code available at: this https URL
AI中文摘要

电池管理系统日益需要在严格的设备端约束下进行准确的电池健康预测。本文提出DLNet,一个实用的双阶段液态神经网络蒸馏框架,将高容量模型转化为紧凑且可边缘部署的电池健康预测模型。DLNet首先应用欧拉离散化重新表述液态动力学以实现嵌入式兼容性。然后进行双阶段知识蒸馏,以传递教师模型的时间行为,并在进一步压缩后恢复该行为。在联合误差-成本目标下的帕累托引导选择保留了平衡准确性和效率的学生模型。我们在广泛使用的数据集上评估DLNet,并在Arduino Nano 33 BLE Sense上使用int8部署验证实际设备可行性。最终部署的学生模型在预测未来100个周期的电池健康时实现了0.0066的低误差,比教师模型低15.4%。模型大小从616 kB减少到94 kB,减少了84.7%,在设备上每次推理耗时21毫秒。这些结果支持了一个实用的“更小胜出”观察:通过适当的监督和选择,小模型可以在边缘预测中匹配或超越大模型。除了电池,DLNet框架可以扩展到其他具有严格硬件约束的工业分析任务。

英文摘要

Battery management systems increasingly require accurate battery health prognostics under strict on-device constraints. This paper presents DLNet, a practical framework with dual-stage distillation of liquid neural networks that turns a high-capacity model into compact and edge-deployable models for battery health prediction. DLNet first applies Euler discretization to reformulate liquid dynamics for embedded compatibility. It then performs dual-stage knowledge distillation to transfer the teacher model's temporal behavior and recover it after further compression. Pareto-guided selection under joint error-cost objectives retains student models that balance accuracy and efficiency. We evaluate DLNet on a widely used dataset and validate real-device feasibility on an Arduino Nano 33 BLE Sense using int8 deployment. The final deployed student achieves a low error of 0.0066 when predicting battery health over the next 100 cycles, which is 15.4% lower than the teacher model. It reduces the model size from 616 kB to 94 kB with 84.7% reduction and takes 21 ms per inference on the device. These results support a practical smaller wins observation that a small model can match or exceed a large teacher for edge-based prognostics with proper supervision and selection. Beyond batteries, the DLNet framework can extend to other industrial analytics tasks with strict hardware constraints.

2601.03184 2026-06-12 cs.LG cs.AI 版本更新

Decentralized Autoregressive Generation

分散自回归生成

Stepan Maschan, Haoxuan Qu, Jun Liu

发表机构 * Lancaster University(兰卡斯特大学)

AI总结 本文通过离散流匹配框架证明分散训练与集中训练在理论上等价,实验验证其在多模态基准上保持竞争力。

详情
AI中文摘要

近年来,自回归生成的分散化作为解决扩展瓶颈的方案引起了广泛关注。然而,尽管有令人鼓舞的实验结果,这一范式目前缺乏严格的理论证明。在这项工作中,我们正式建立了分散训练与集中训练之间的理论等价性。为此,我们调整了离散流匹配框架用于自回归生成,利用其固有性质证明全局模型自然分解为独立专家。最后,我们在多种多模态基准上进行了大量实验,实验验证了分散训练在标准集中架构上保持竞争性。

英文摘要

The decentralization of autoregressive generation has attracted considerable attention in recent years as a solution to scaling bottlenecks. However, despite promising empirical results, this paradigm currently lacks rigorous theoretical justification. In this work, we formally establish the theoretical equivalence between decentralized and centralized training. To achieve this, we adapt the Discrete Flow Matching framework for autoregressive generation, leveraging its inherent properties to demonstrate that global models naturally decompose into independent experts. Finally, we conduct extensive experiments across diverse multimodal benchmarks, empirically validating that decentralized training maintains competitive parity with standard centralized architectures.

2601.06279 2026-06-12 cs.CV 版本更新

EyeTheia: A Lightweight and Accessible Eye-Tracking Toolbox

EyeTheia:一个轻量级且易用的眼动追踪工具箱

Stevenson Pather, Niels Martignène, Arnaud Bugnet, Fouad Boutaleb, Fabien D'Hondt, Deise Santana Maia

发表机构 * Univ. Lille, Inserm, CHU Lille, U1172 - LilNCog - Lille Neuroscience & Cognition(里尔大学、法国国家医学研究院、里尔大学医院、U1172 - 里尔神经科学与认知中心) Univ. Lille, CNRS, Centrale Lille, UMR 9189 CRIStAL(里尔大学、法国国家科学研究中心、里尔中央理工大学、UMR 9189 CRIStAL) Centre national de ressources et de résilience (CN2R)(资源与韧性国家研究中心)

AI总结 提出基于网络摄像头的轻量级眼动追踪管道EyeTheia,结合MediaPipe特征提取和CNN模型,通过用户微调降低预测误差,在点探测任务中与商业方案表现一致。

详情
Comments
Code for the EyeTheia: this https URL. Experimental platform for the cognitive neuroscience task (BAWEB IAPS): this https URL
AI中文摘要

我们介绍了EyeTheia,一个用于基于网络摄像头的视线估计的轻量级开源深度学习管道,专为基于浏览器的实验平台和现实世界的认知与临床研究设计。EyeTheia仅使用标准笔记本电脑摄像头即可实现实时视线追踪,结合基于MediaPipe的 landmarks 提取和受iTracker启发的卷积神经网络,并支持可选的用户特定微调。我们研究了两种互补策略:在移动数据上预训练模型,以及在桌面数据集上从头训练相同架构。在MPIIFaceGaze上的验证结果显示,在标定前两种方法性能相当,而轻量级的用户特定微调持续降低了视线预测误差。我们还在一个真实的点探测任务中评估了EyeTheia,并与商业网络摄像头追踪器SeeSo SDK进行了比较。结果表明,在刺激呈现期间左右视线分配上具有高度一致性,尽管时间变异性更高。总体而言,EyeTheia为低成本视线追踪提供了一个透明且可扩展的解决方案,适用于可扩展和可重复的实验与临床研究。代码、训练模型和实验材料均已公开。

英文摘要

We introduce EyeTheia, a lightweight and open deep learning pipeline for webcam-based gaze estimation, designed for browser-based experimental platforms and real-world cognitive and clinical research. EyeTheia enables real-time gaze tracking using only a standard laptop webcam, combining MediaPipe-based landmark extraction with a convolutional neural network inspired by iTracker and optional user-specific fine-tuning. We investigate two complementary strategies: adapting a model pretrained on mobile data and training the same architecture from scratch on a desktop-oriented dataset. Validation results on MPIIFaceGaze show comparable performance between both approaches prior to calibration, while lightweight user-specific fine-tuning consistently reduces gaze prediction error. We further evaluate EyeTheia in a realistic Dot-Probe task and compare it to the commercial webcam-based tracker SeeSo SDK. Results indicate strong agreement in left-right gaze allocation during stimulus presentation, despite higher temporal variability. Overall, EyeTheia provides a transparent and extensible solution for low-cost gaze tracking, suitable for scalable and reproducible experimental and clinical studies. The code, trained models, and experimental materials are publicly available.

2601.04885 2026-06-12 cs.CL cs.AI cs.LG 版本更新

CuMA: Aligning LLMs with Sparse Cultural Values via Demographic-Aware Mixture of Adapters

CuMA: 通过人口统计感知的适配器混合使大语言模型与稀疏文化价值观对齐

Ao Sun, Xiaoyu Wang, Zhe Tan, Yu Li, Jiachen Zhu, Shu Su, Yuheng Jia

发表机构 * Southeast University(东南大学) ByteDance Inc.(字节跳动公司) Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China(新一代人工智能技术及其交叉应用重点实验室(东南大学),中华人民共和国教育部,中国)

AI总结 提出CuMA框架,通过人口统计感知路由将冲突梯度分离到专家子空间,解决密集模型在多文化对齐中的均值崩溃问题,在WorldValuesBench等基准上取得最优性能。

详情
Comments
ACL 2026 Main
AI中文摘要

随着大语言模型服务于全球用户,对齐必须从强制执行普遍共识转向尊重文化多元主义。我们证明,密集模型在被迫适应冲突的价值分布时会出现\textbf{均值崩溃},收敛到无法代表不同群体的通用平均值。我们将其归因于\textbf{文化稀疏性},其中梯度干扰阻止密集参数跨越不同的文化模式。为解决此问题,我们提出\textbf{\textsc{CuMA}}(\textbf{Cu}ltural \textbf{M}ixture of \textbf{A}dapters),一个将对齐视为\textbf{条件容量分离}问题的框架。通过引入人口统计感知路由,\textsc{CuMA}内化了一个\textit{潜在文化拓扑},以将冲突梯度明确解耦到专门的专家子空间中。在WorldValuesBench、Community Alignment和PRISM上的广泛评估表明,\textsc{CuMA}达到了最先进的性能,显著优于密集基线和仅语义MoE。关键的是,我们的分析证实\textsc{CuMA}有效缓解了均值崩溃,保留了文化多样性。我们的代码可在该https URL获取。

英文摘要

As Large Language Models (LLMs) serve a global audience, alignment must transition from enforcing universal consensus to respecting cultural pluralism. We demonstrate that dense models, when forced to fit conflicting value distributions, suffer from \textbf{Mean Collapse}, converging to a generic average that fails to represent diverse groups. We attribute this to \textbf{Cultural Sparsity}, where gradient interference prevents dense parameters from spanning distinct cultural modes. To resolve this, we propose \textbf{\textsc{CuMA}} (\textbf{Cu}ltural \textbf{M}ixture of \textbf{A}dapters), a framework that frames alignment as a \textbf{conditional capacity separation} problem. By incorporating demographic-aware routing, \textsc{CuMA} internalizes a \textit{Latent Cultural Topology} to explicitly disentangle conflicting gradients into specialized expert subspaces. Extensive evaluations on WorldValuesBench, Community Alignment, and PRISM demonstrate that \textsc{CuMA} achieves state-of-the-art performance, significantly outperforming both dense baselines and semantic-only MoEs. Crucially, our analysis confirms that \textsc{CuMA} effectively mitigates mean collapse, preserving cultural diversity. Our code is available at this https URL.

2507.07947 2026-06-12 cs.LG cs.AI 版本更新

Reconstructing Template-Memorized Images from Natural Prompts

从自然提示中重建模板记忆的图像

Sol Yarkoni, Mahmood Sharif, Roi Livni

发表机构 * School of Electrical & Computer Engineering(电气与计算机工程学院) School of Computer Science & AI(计算机科学与人工智能学院) Tel Aviv University(特拉维夫大学)

AI总结 提出一种低资源攻击方法,利用模板化电商数据中的模式,从自然提示中重建训练集中的记忆图像,揭示隐私风险。

详情
AI中文摘要

生成模型(如扩散模型)的最新进展引发了与隐私、版权侵犯和数据管理相关的担忧。为了更好地理解和控制这些风险,先前的工作引入了从训练数据中重建图像或部分图像的技术和攻击。虽然这些结果表明训练数据可以被恢复,但现有方法通常依赖于高计算资源、对训练集的部分访问或精心设计的提示。在这项工作中,我们提出了一种新的攻击,该攻击需要低资源,假设对训练数据几乎没有或完全没有访问权限,并识别出看似良性的提示,这些提示可能导致潜在有风险的图像重建。我们进一步表明,即使对于没有专业知识的用户,这种重建也可能无意中发生。例如,我们观察到,对于现有模型,提示“蓝色男女通用T恤”会生成一个真实个体的面部。此外,通过将已识别的漏洞与真实世界的提示数据相结合,我们发现了能够重现记忆视觉元素的提示。我们的方法建立在先前工作的见解之上,并利用领域知识来揭示由于使用抓取的电商数据而产生的基本漏洞,其中模板化布局和图像与模式化的文本提示紧密相关。我们的攻击代码在此https URL公开。

英文摘要

Recent advances in generative models, such as diffusion models, have raised concerns related to privacy, copyright infringement, and data stewardship. To better understand and control these risks, prior work has introduced techniques and attacks that reconstruct images, or parts of images, from training data. While these results demonstrate that training data can be recovered, existing methods often rely on high computational resources, partial access to the training set, or carefully engineered prompts. In this work, we present a new attack that requires low resources, assumes little to no access to the training data, and identifies seemingly benign prompts that can lead to potentially risky image reconstruction. We further show that such reconstructions may occur unintentionally, even for users without specialized knowledge. For example, we observe that for one existing model, the prompt ``blue Unisex T-Shirt'' generates the face of a real individual. Moreover, by combining the identified vulnerabilities with real-world prompt data, we discover prompts that reproduce memorized visual elements. Our approach builds on insights from prior work and leverages domain knowledge to expose a fundamental vulnerability arising from the use of scraped e-commerce data, where templated layouts and images are closely tied to pattern-like textual prompts. The code for our attack is publicly available at this https URL.

2601.02177 2026-06-12 cs.CV cs.CR 版本更新

Why Commodity WiFi Sensors Fail at Multi-Person Gait Identification: A Systematic Analysis Using ESP32

为什么商用WiFi传感器在多人体步态识别中失败:基于ESP32的系统分析

Oliver Custance, Saad Khan, Simon Parkinson

发表机构 * University of Cambridge(剑桥大学)

AI总结 通过ESP32实验发现,多人体步态识别性能差主要源于商用WiFi的感知质量限制,而非算法选择。

详情
AI中文摘要

WiFi信道状态信息(CSI)在单人步态识别中展现出潜力,引发了对其在非接触式生物识别、持续认证和被动识别中应用的兴趣。然而,在低成本商用设备上进行多人识别的可行性仍不清楚。一个关键问题是,较差的多人性能主要是算法限制,还是反映了商用WiFi硬件更根本的感知上限。我们通过使用商用ESP32 WiFi传感器的系统实证研究来回答这个问题。我们评估了六种不同的信号分离方法——FastICA、SOBI、PCA-ICA、NMF、小波和张量分解——在七个场景中,覆盖1-10人,包括受控和现实室内环境。为了超越分类准确率进行研究,我们引入了三个诊断指标:受试者内变异性(ISV)、受试者间可区分性(ISD)和性能退化率(PDR)。所有方法的性能均中等(39%-56%准确率),几乎没有证据表明仅靠算法选择能解决问题。表现最佳的方法NMF达到56%准确率,而所有方法都表现出极高的特征空间重叠(97%-99%)、不稳定的受试者内表示以及显著的环境敏感性。这些发现表明,在商用ESP32 CSI约束下,密集多人步态识别更多受限于感知质量和空间多样性,而非所选分离算法。我们的结果对安全和隐私有直接影响:它们质疑了商用WiFi CSI作为稳健的多用户生物识别基元的实用性,同时也对低成本现成WiFi硬件可实现的被动识别能力施加了重要限制。

英文摘要

WiFi Channel State Information (CSI) has shown promise for single-person gait identification, raising interest in its use for contactless biometrics, continuous authentication, and passive identification. However, the feasibility of multi-person identification on low-cost commodity devices remains unclear. A critical question is whether weak multi-person performance is primarily an algorithmic limitation, or whether it reflects a more fundamental sensing ceiling on commodity WiFi hardware. We address this question through a systematic empirical study using commodity ESP32 WiFi sensors. We evaluated six different signal separation methods--FastICA, SOBI, PCA-ICA, NMF, Wavelet, and Tensor decomposition--across seven scenarios spanning 1-10 people in both controlled and realistic indoor environments. To investigate beyond classification accuracy, we introduce three diagnostic metrics: intra-subject variability (ISV), inter-subject distinguishability (ISD), and performance degradation rate (PDR). In all methods, performance remains moderate (39%-56% accuracy), with limited evidence that algorithmic choice alone solves the problem. The best-performing method, NMF, reaches 56% accuracy, while all methods exhibit extremely high feature-space overlap (97%-99%), unstable within-subject representations, and marked environmental sensitivity. These findings suggest that, under commodity ESP32 CSI constraints, dense multi-person gait identification is limited more by sensing quality and spatial diversity than by the chosen separation algorithm. Our results have direct implications for security and privacy: they call into question the practicality of commodity WiFi CSI as a robust multi-user biometric primitive for authentication, while also placing important bounds on the passive identification capabilities achievable with low-cost off-the-shelf WiFi hardware.

2304.13836 2026-06-12 cs.LG cs.AI cs.CV stat.ME 版本更新

On Pitfalls of $\textit{RemOve-And-Retrain}$: Data Processing Inequality Perspective

论 $\textit{RemOve-And-Retrain}$ 的陷阱:数据处理不等式视角

Junhwa Song, Keumgang Cha, Junghoon Seo

发表机构 * KAIST(韩国科学技术院)

AI总结 从信息论角度揭示ROAR基准的缺陷:数据无关的后处理可提升ROAR分数,导致对归因图信息量的误判,并发现模糊性偏差。

详情
Comments
Accepted at the 2026 ICML Workshop on Mechanistic Interpretability
AI中文摘要

RemOve-And-Retrain (ROAR) 基准被广泛用于评估特征归因方法,但其有效性尚未从信息论角度得到充分探索。我们证明,对归因图进行模型和数据无关的后处理(通过数据处理不等式,这些变换\emph{不能}增加关于决策函数的信息)通常可以改善ROAR分数。这意味着ROAR排名的提升本身并不能证明归因图携带更多关于模型的信息。我们将这种失败模式归因于对空间模糊掩膜的偏好。在CIFAR-10、SVHN和CUB-200上的实验显示,模糊度与ROAR性能之间存在一致的关联,这种模式也出现在ROAD变体中。我们为更谨慎的基于移除的基准测试提供了指导方针,这对验证神经网络内部机制的机械理解具有重要意义。

英文摘要

The RemOve-And-Retrain (ROAR) benchmark is widely used to evaluate feature attribution methods, yet its validity remains underexplored from an information-theoretic perspective. We show that model- and data-agnostic post-processing of attribution maps (transformations that, by the data processing inequality, \emph{cannot} add information about the decision function) can often improve ROAR scores. This means that an improved ROAR ranking is not, by itself, evidence that an attribution map carries more information about the model. We trace this failure mode to a bias toward spatially blurry masks. Experiments on CIFAR-10, SVHN, and CUB-200 show a consistent association between blurriness and ROAR performance, a pattern that also appears in the ROAD variant. We provide guidelines for more cautious removal-based benchmarking, with implications for validating mechanistic understanding of neural network internals.

2512.23566 2026-06-12 math.DS cond-mat.stat-mech cs.LG math.OC stat.ML 版本更新

From geometry to dynamics: Learning overdamped Langevin dynamics from sparse observations with geometric constraints

从几何到动力学:基于几何约束从稀疏观测学习过阻尼朗之万动力学

Dimitra Maoutsa

发表机构 * Dimitra Maoutsa(迪米特拉·马乌茨)

AI总结 提出一种随机控制框架,利用系统不变密度的几何结构进行路径增强,从稀疏时间采样数据中恢复过阻尼朗之万动力学,无需参数模型假设。

详情
Comments
10+54 pages, 14 figures; accepted at ICML 2026 An earlier account of this work has previously appeared in arXiv:2301.08102 and arXiv:2304.00423; main methodology remains the same, this version includes additional numerical experiments and theory
AI中文摘要

当随机系统的轨迹在时间上稀疏采样时,我们如何学习其动力学背后的规律?现有方法要么需要时间分辨的高频观测,要么依赖于仅适用于保守系统的几何论证,限制了它们能恢复的动力学范围。在这里,我们提出一个新的框架,通过将推断重新表述为随机控制问题来调和这两种观点。我们的方法使用几何驱动的路径增强,以系统不变密度的几何结构为指导,重构可能的轨迹并推断底层动力学,而不假设特定的参数模型。应用于过阻尼朗之万系统,我们的方法即使在极度欠采样数据下也能准确恢复随机动力学,在合成基准测试中优于现有方法。这项工作证明了将几何归纳偏差纳入随机系统识别方法的有效性。

英文摘要

How can we learn the laws underlying the dynamics of stochastic systems when their trajectories are sampled sparsely in time? Existing methods either require temporally resolved high-frequency observations, or rely on geometric arguments that apply only to conservative systems, limiting the range of dynamics they can recover. Here, we present a new framework that reconciles these two perspectives by reformulating inference as a stochastic control problem. Our method uses geometry-driven path augmentation, guided by the geometry in the system's invariant density to reconstruct likely trajectories and infer the underlying dynamics without assuming specific parametric models. Applied to overdamped Langevin systems, our approach accurately recovers stochastic dynamics even from extremely undersampled data, outperforming existing methods in synthetic benchmarks. This work demonstrates the effectiveness of incorporating geometric inductive biases into stochastic system identification methods.

2512.21227 2026-06-12 cond-mat.mtrl-sci cs.AI 版本更新

PhononBench:A Large-Scale Phonon-Based Benchmark for Dynamical Stability in Crystal Generation

PhononBench:面向晶体生成中动态稳定性的基于声子的大规模基准

Xiao-Qi Han, Ze-Feng Gao, Wen-Kao Li, Peng-Jie Guo, Zhong-Yi Lu

发表机构 * School of Physics, Renmin University of China(中国人民大学物理学院)

AI总结 提出PhononBench,首个大规模AI生成晶体动态稳定性基准,利用MatterSim势高效计算声子,评估7个模型生成的133,838个结构,发现平均动态稳定性率仅32.15%。

详情
Comments
53 pages, 6 figures
AI中文摘要

近年来,生成式人工智能在晶体材料设计方面取得了显著进展,催生了基于图神经网络、扩散模型和大语言模型的方法。现有评估通常遵循稳定性-唯一性-新颖性(S.U.N.)框架,其中稳定性主要使用热力学标准评估,这未能完全捕捉材料实际存在所必需的动态稳定性。动态稳定性是决定材料能否被合成并持续存在的关键因素,声子谱计算是其评估标准。然而,此类计算的高计算成本阻碍了对生成晶体动态稳定性的大规模评估。在这项工作中,我们引入了PhononBench,这是首个针对AI生成晶体动态稳定性的大规模基准。利用最近开发的MatterSim原子间势,该势能在超过10,000种材料中实现了密度泛函理论(DFT)级别的声子预测精度,PhononBench能够对7个领先晶体生成模型生成的133,838个晶体结构进行高效的声子计算和动态稳定性分析。PhononBench揭示了当前生成模型的一个普遍局限性:除非另有说明,所有报告的动态稳定性指标均在-0.1 THz的声子频率阈值下评估,所有生成结构的平均动态稳定性率仅为32.15%,表现最佳的模型MatterGen也仅达到45.05%。此外,我们识别出32,995个在-0.001 THz严格阈值下整个布里渊区声子稳定的晶体结构。另外,一个基于网页的服务可通过此http URL访问,实现分钟级的超快声子预测。

英文摘要

In recent years, generative artificial intelligence has made significant advances in the design of crystalline materials, giving rise to approaches based on graph neural networks, diffusion models, and large language models. Existing evaluations commonly follow the stability-uniqueness-novelty (S.U.N.) framework, where stability is primarily assessed using thermodynamic criteria, which do not fully capture the dynamical stability essential for a material's practical existence. Dynamical stability is a key determinant of whether a material can be synthesized and persist, with phonon spectrum calculations serving as the standard for its evaluation. However, the high computational cost of such calculations has prevented large-scale assessment of dynamical stability in generated crystals. In this work, we introduce PhononBench, the first large-scale benchmark for dynamical stability in AI-generated crystals. Leveraging the recently developed MatterSim interatomic potential, which achieves density-functional-theory (DFT)-level accuracy in phonon predictions across more than 10,000 materials, PhononBench enables efficient phonon calculations and dynamical-stability analysis for 133,838 crystal structures generated by 7 leading crystal generation models. PhononBench reveals a widespread limitation of current generative models: unless otherwise specified, all reported dynamical-stability metrics are evaluated at a phonon-frequency threshold of -0.1 THz, with the average dynamical-stability rate across all generated structures being only 32.15%, and the top-performing model, MatterGen, reaching just 45.05%.In addition, we identify 32,995 crystal structures that are phonon-stable across the entire Brillouin zone under a strict threshold of -0.001 THz. In addition, a web-based service is accessible at this http URL, enabling minute-level ultra-fast phonon predictions.

2509.07150 2026-06-12 cs.LG cond-mat.mtrl-sci 版本更新

PLaID++: A Preference Aligned Language Model for Targeted Inorganic Materials Design

PLaID++: 一种用于定向无机材料设计的偏好对齐语言模型

Andy Xu, Rohan Desai, Larry Wang, Ethan Ritz, Gabriel Hope

发表机构 * University of California, Berkeley(加州大学伯克利分校) Stanford University(斯坦福大学)

AI总结 提出PLaID++,通过对称性感知的Wyckoff文本表示和温度缩放熵正则化,结合可验证奖励的强化学习,实现稳定、新颖且满足空间群属性的晶体生成,比先前方法效率提高约50%。

详情
Comments
Code available at this https URL, model weights at this https URL
AI中文摘要

基于可验证奖励的强化学习(RLVR)已成为提高LLM正确性的有前景方法,然而在许多科学问题中,目标并非产生正确答案,而是产生满足一组约束的多样化候选方案。我们在材料生成背景下研究这一挑战。为此,我们引入了PLaID++,一个经过后训练的LLM,用于稳定且属性引导的晶体生成。我们发现性能取决于我们的晶体学表示和奖励公式。首先,我们引入了一种紧凑的、对称性感知的Wyckoff文本表示,提高了计算效率并鼓励从物理先验中泛化。其次,我们证明了温度缩放作为熵正则化器,可以抵消模式坍塌并鼓励探索。通过将对称性约束直接编码到文本中,并将模型输出引导至理想的化学空间,PLaID++生成热力学稳定、独特且新颖的结构,其速率比先前方法高约50%,并能条件性地生成具有所需空间群属性的结构。我们的工作展示了将自然语言处理中的后训练技术适应于材料设计的潜力,为定向和高效发现新材料铺平了道路。

英文摘要

Reinforcement Learning from Verifiable Rewards (RLVR) has emerged as a promising approach to improve correctness in LLMs, however, in many scientific problems, the objective is not necessarily to produce the correct answer, but instead to produce a diverse array of candidates which satisfy a set of constraints. We study this challenge in the context of materials generation. To this end, we introduce PLaID++, an LLM post-trained for stable and property-guided crystal generation. We find that performance hinges on our crystallographic representation and reward formulation. First, we introduce a compact, symmetry-informed Wyckoff text representation which improves computational efficiency and encourages generalization from physical priors. Second, we demonstrate that temperature scaling acts as an entropy regularizer which counteracts mode collapse and encourages exploration. By encoding symmetry constraints directly into text and guiding model outputs towards desirable chemical space, PLaID++ generates structures that are thermodynamically stable, unique, and novel at a $\sim$50\% greater rate than prior methods and conditionally generates structures with desired space group properties. Our work demonstrates the potential of adapting post-training techniques from natural language processing to materials design, paving the way for targeted and efficient discovery of novel materials.

2511.23030 2026-06-12 cs.RO cs.CV 版本更新

DiskChunGS: Large-Scale 3D Gaussian SLAM Through Chunk-Based Memory Management

DiskChunGS:基于分块内存管理的大规模3D高斯SLAM

Casimir Feldmann, Maximum Wilder-Smith, Vaishakh Patil, Michael Oechsle, Michael Niemeyer, Keisuke Tateno, Marco Hutter

发表机构 * Robotic Systems Lab, ETH Zurich(机器人系统实验室,瑞士苏黎世联邦理工学院) Google(谷歌)

AI总结 提出DiskChunGS,通过将场景划分为空间块并将非活跃区域存储于磁盘,突破GPU内存限制,实现大规模3D高斯SLAM,在多个数据集上完成全序列重建并提升视觉质量。

详情
AI中文摘要

近期3D高斯溅射(3DGS)的进展在实时渲染的新视角合成中展现了令人印象深刻的结果。然而,将3DGS与SLAM系统集成面临根本的可扩展性限制:方法受限于GPU内存容量,只能重建小规模环境。我们提出DiskChunGS,一种可扩展的3DGS SLAM系统,通过一种外核方法克服这一瓶颈,该方法将场景划分为空间块,并在GPU内存中仅维护活跃区域,同时将非活跃区域存储在磁盘上。我们的架构与现有的用于位姿估计和闭环检测的SLAM框架无缝集成,实现大规模全局一致的重建。我们在室内场景(Replica、TUM-RGBD)、城市驾驶场景(KITTI)以及资源受限的Nvidia Jetson平台上验证了DiskChunGS。我们的方法独特地完成了所有11个KITTI序列,没有出现内存故障,同时实现了卓越的视觉质量,证明了算法创新可以克服先前限制3DGS SLAM方法的内存约束。

英文摘要

Recent advances in 3D Gaussian Splatting (3DGS) have demonstrated impressive results for novel view synthesis with real-time rendering capabilities. However, integrating 3DGS with SLAM systems faces a fundamental scalability limitation: methods are constrained by GPU memory capacity, restricting reconstruction to small-scale environments. We present DiskChunGS, a scalable 3DGS SLAM system that overcomes this bottleneck through an out-of-core approach that partitions scenes into spatial chunks and maintains only active regions in GPU memory while storing inactive areas on disk. Our architecture integrates seamlessly with existing SLAM frameworks for pose estimation and loop closure, enabling globally consistent reconstruction at scale. We validate DiskChunGS on indoor scenes (Replica, TUM-RGBD), urban driving scenarios (KITTI), and resource-constrained Nvidia Jetson platforms. Our method uniquely completes all 11 KITTI sequences without memory failures while achieving superior visual quality, demonstrating that algorithmic innovation can overcome the memory constraints that have limited previous 3DGS SLAM methods.

2510.16928 2026-06-12 cs.CL 版本更新

ChiKhaPo: A Large-Scale Multilingual Benchmark for Evaluating Lexical Comprehension and Generation in Large Language Models

ChiKhaPo: 一个用于评估大型语言模型词汇理解与生成能力的大规模多语言基准

Emily Chang, Niyati Bafna

发表机构 * Toyota Technological Institute at Chicago(芝加哥丰田技术研究所) Johns Hopkins University, Center for Language and Speech Processing(约翰霍普金斯大学语言与语音处理中心)

AI总结 针对现有基准语言覆盖不足且侧重高阶任务的问题,提出ChiKhaPo基准,包含8个子任务,覆盖2700+种语言,评估LLM的词汇理解与生成能力,发现6个SOTA模型表现不佳。

详情
AI中文摘要

现有的大型语言模型(LLM)基准主要局限于高资源或中资源语言,并且通常评估推理和生成方面的高阶任务性能。然而,大量证据表明,LLM在全球3800多种书面语言中的绝大多数语言中缺乏基本的语言能力。我们引入了ChiKhaPo,它包含8个难度不同的子任务,旨在评估生成模型的词汇理解和生成能力。ChiKhaPo利用现有的词典、单语数据和双语文本,为2个子任务提供了2700多种语言的覆盖,在语言覆盖范围上超过了任何现有基准。我们进一步展示了6个SOTA模型在我们的基准上表现不佳,并讨论了影响性能分数的因素,包括语系、语言资源丰富度、任务以及理解与生成方向。通过ChiKhaPo,我们希望促进并鼓励对LLM进行大规模多语言基准测试。

英文摘要

Existing benchmarks for large language models (LLMs) are largely restricted to high- or mid-resource languages, and often evaluate performance on higher-order tasks in reasoning and generation. However, plenty of evidence points to the fact that LLMs lack basic linguistic competence in the vast majority of the world's 3800+ written languages. We introduce ChiKhaPo, consisting of 8 subtasks of varying difficulty designed to evaluate the lexical comprehension and generation abilities of generative models. ChiKhaPo draws on existing lexicons, monolingual data, and bitext, and provides coverage for 2700+ languages for 2 subtasks, surpassing any existing benchmark in terms of language coverage. We further show that 6 SOTA models struggle on our benchmark, and discuss the factors contributing to performance scores, including language family, language resourcedness, task, and comprehension versus generation directions. With ChiKhaPo, we hope to enable and encourage the massively multilingual benchmarking of LLMs.

2511.19716 2026-06-12 math.NA cs.LG 版本更新

Design Criteria for SGD Preconditioners: Local Conditioning, Noise Floors, and Basin Stability

SGD预条件子的设计准则:局部条件数、噪声基底与盆地稳定性

Mitchell Scott, Tianshi Xu, Ziyuan Tang, Alexandra Pichette-Emmons, Qiang Ye, Yousef Saad, Yuanzhe Xi

发表机构 * Department of Mathematics, Emory University(埃默里大学数学系) Department of Mathematics, University of Minnesota Twin Cities(明尼苏达大学双城分校数学系) Department of Computer Science, University of Minnesota Twin Cities(明尼苏达大学双城分校计算机科学系) Department of Mathematics, University of Kentucky(肯塔基大学数学系)

AI总结 针对SGD在训练后期因各向异性曲率和梯度噪声导致的收敛缓慢问题,提出基于对称正定矩阵M的预条件SGD分析框架,推导收敛速率和噪声基底受M相关量控制的界,并给出非凸目标下的盆地稳定性保证,为科学机器学习提供设计准则。

详情
Comments
31 pages, 11 Figures
AI中文摘要

随机梯度下降(SGD)在训练后期常因各向异性曲率和梯度噪声而变慢。我们在对称正定矩阵$\mathbf{M}$诱导的几何中分析预条件SGD,推导出收敛速率和随机噪声基底均受$\mathbf{M}$相关量控制的界:速率通过$\mathbf{M}$度量下的有效条件数,基底通过该条件数与预条件噪声水平的乘积。对于非凸目标,我们建立了依赖于预条件子的盆地稳定性保证:当光滑性和盆地大小以$\mathbf{M}$范数度量时,迭代停留在良好局部区域的概率有显式下界。这一视角在科学机器学习(SciML)中尤为重要,其中在随机更新下实现小训练损失与物理保真度、数值稳定性和约束满足密切相关。该框架适用于对角/自适应和曲率感知预条件子,并给出一个简单的设计原则:选择$\mathbf{M}$以改善局部条件同时衰减噪声。在二次诊断问题和三个SciML基准上的实验验证了预测的速率-基底行为。

英文摘要

Stochastic Gradient Descent (SGD) often slows in the late stage of training due to anisotropic curvature and gradient noise. We analyze preconditioned SGD in the geometry induced by a symmetric positive definite matrix $\mathbf{M}$, deriving bounds in which both the convergence rate and the stochastic noise floor are governed by $\mathbf{M}$-dependent quantities: the rate through an effective condition number in the $\mathbf{M}$-metric, and the floor through the product of that condition number and the preconditioned noise level. For nonconvex objectives, we establish a preconditioner-dependent basin-stability guarantee: when smoothness and basin size are measured in the $\mathbf{M}$-norm, the probability that the iterates remain in a well-behaved local region admits an explicit lower bound. This perspective is particularly relevant in Scientific Machine Learning (SciML), where achieving small training loss under stochastic updates is closely tied to physical fidelity, numerical stability, and constraint satisfaction. The framework applies to both diagonal/adaptive and curvature-aware preconditioners and yields a simple design principle: choose $\mathbf{M}$ to improve local conditioning while attenuating noise. Experiments on a quadratic diagnostic and three SciML benchmarks validate the predicted rate-floor behavior.

2511.19652 2026-06-12 cs.CV 版本更新

Navigating Gigapixel Pathology Images with Large Multimodal Models

利用大型多模态模型导航千兆像素病理图像

Thomas A. Buckley, Kian R. Weihrauch, Katherine Latham, Andrew Z. Zhou, Padmini A. Manrai, Arjun K. Manrai

发表机构 * Department of Biomedical Informatics, Harvard Medical School(哈佛医学院生物医学信息学系) Department of Pathology, Massachusetts General Hospital(麻省总医院病理学系) Department of Pathology and Laboratory Medicine, Brown University(布朗大学病理学与实验室医学系)

AI总结 提出GIANT方法,无需训练即可让通用多模态模型自主导航WSI,通过迭代选择多放大倍数裁剪并聚合证据,在MultiPathQA基准上实现SOTA。

详情
AI中文摘要

近期大型多模态模型的进展使得开发能够对话和推理病理全切片图像(WSI)的交互式聊天模型成为可能。然而,现有的切片级聊天系统通常高度专业化,通常将WSI压缩为固定的切片级嵌入或依赖多组件流水线,这可能会丢失多尺度细节并限制目标任务之外的泛化能力。我们提出GIANT(千兆像素图像组织导航代理),一种简单、无需训练的方法,让通用多模态模型自主导航WSI,迭代选择多放大倍数裁剪并随时间聚合证据。为了评估WSI问答中的泛化能力并促进可重复性,我们引入了MultiPathQA,一个涵盖五个临床挑战和934个问题(涉及868个独特WSI)的基准套件。其中包括128道由病理学家编写的多项选择题,旨在模拟真实的诊断搜索和多尺度推理。使用GPT-5,GIANT在五个基准中的四个上取得了最先进的性能,优于专门用于病理问答的模型。

英文摘要

Recent advances in large multimodal models have allowed for the development of interactive chat models that can converse and reason about pathology whole-slide images (WSIs). However, existing slide-level chat systems are often highly specialized, typically compressing WSIs into fixed slide-level embeddings or relying on multi-component pipelines, which can lose multi-scale detail and limit generalizability beyond the target task. We present GIANT (Gigapixel Image Agent for Navigating Tissue), a simple, training-free approach that lets general-purpose multimodal models navigate WSIs on their own, iteratively selecting multi-magnification crops and aggregating evidence over time. To evaluate generalizability in WSI question answering and to promote reproducibility, we introduce MultiPathQA, a benchmark suite spanning five clinical challenges and 934 questions over 868 unique WSIs. This includes a new set of 128 pathologist-authored multiple-choice questions designed to mirror real diagnostic search and multi-scale reasoning. Using GPT-5, GIANT outperforms models specialized for pathology question answering, achieving state-of-the-art performance on four out of five benchmarks.

2511.17221 2026-06-12 cs.CV cs.RO 版本更新

QueryOcc: Query-based Self-Supervision for 3D Semantic Occupancy

QueryOcc:基于查询的3D语义占据自监督方法

Adam Lilja, Ji Lan, Junsheng Fu, Lars Hammarstrand

发表机构 * Chalmers University of Technology(查尔姆斯理工大学) Zenseact

AI总结 提出QueryOcc,一种基于查询的自监督框架,通过相邻帧的4D时空查询直接学习连续3D语义占据,利用视觉基础模型或激光雷达数据提供监督,并引入收缩场景表示以在恒定内存下实现远程监督,在Occ3D-nuScenes基准上语义RayIoU提升26%。

详情
AI中文摘要

从图像学习3D场景几何和语义是计算机视觉的核心挑战,也是自动驾驶的关键能力。由于大规模3D标注成本过高,近期研究探索直接从传感器数据中进行自监督学习,无需人工标签。现有方法要么依赖2D渲染一致性(3D结构仅隐式出现),要么依赖来自累积激光雷达点云的离散化体素网格,限制了空间精度和可扩展性。我们提出QueryOcc,一种基于查询的自监督框架,通过跨相邻帧采样的独立4D时空查询直接学习连续3D语义占据。该框架支持来自视觉基础模型导出的伪点云或原始激光雷达数据的监督。为了实现恒定内存下的远程监督和推理,我们引入了一种收缩场景表示,在平滑压缩远处区域的同时保留近场细节。QueryOcc在自监督Occ3D-nuScenes基准上以11.6 FPS运行,语义RayIoU比之前的基于相机的方法提升26%,表明直接4D查询监督能够实现强大的自监督占据学习。

英文摘要

Learning 3D scene geometry and semantics from images is a core challenge in computer vision and a key capability for autonomous driving. Since large-scale 3D annotation is prohibitively expensive, recent work explores self-supervised learning directly from sensor data without manual labels. Existing approaches either rely on 2D rendering consistency, where 3D structure emerges only implicitly, or on discretized voxel grids from accumulated lidar point clouds, limiting spatial precision and scalability. We introduce QueryOcc, a query-based self-supervised framework that learns continuous 3D semantic occupancy directly through independent 4D spatio-temporal queries sampled across adjacent frames. The framework supports supervision from either pseudo-point clouds derived from vision foundation models or raw lidar data. To enable long-range supervision and reasoning under constant memory, we introduce a contractive scene representation that preserves near-field detail while smoothly compressing distant regions. QueryOcc surpasses previous camera-based methods by 26% in semantic RayIoU on the self-supervised Occ3D-nuScenes benchmark while running at 11.6 FPS, demonstrating that direct 4D query supervision enables strong self-supervised occupancy learning. this https URL

2511.13271 2026-06-12 cs.SE cs.AI cs.IR 版本更新

Examining the Usage of Generative AI Models in Student Learning Activities for Software Programming

生成式AI模型在学生软件编程学习活动中的使用研究

Rufeng Chen, Shuaishuai Jiang, Jiyun Shen, AJung Moon, Lili Wei

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 通过对比生成式AI与传统在线资源对编程学习的影响,发现AI能提升任务表现但未必带来知识增益,初学者过度依赖而中级生选择性使用,呼吁将AI作为学习工具而非解题工具。

详情
Comments
9 pages, 4 figures, published at AIWARE 2025
AI中文摘要

生成式AI(GenAI)工具如ChatGPT的兴起为计算教育带来了新的机遇和挑战。现有研究主要关注GenAI完成教育任务的能力及其对学生表现的影响,往往忽视了其对知识获取的作用。在本研究中,我们调查了GenAI辅助与传统在线资源在不同熟练水平下对知识获取的支持效果。我们进行了一项受控用户实验,涉及24名具有两种不同编程经验水平(初学者、中级)的本科生,以考察学生在解决编程任务时如何与ChatGPT互动。我们分析了任务表现、概念理解和交互行为。我们的发现表明,使用GenAI生成完整解决方案显著提高了任务表现,尤其是对初学者而言,但并未持续带来知识增益。重要的是,使用策略因经验而异:初学者倾向于过度依赖GenAI以完成任务,过程中往往没有知识增益,而中级生则采用更具选择性的方法。我们发现,过度依赖和极少使用都会导致整体知识增益较弱。基于我们的结果,我们呼吁学生和教育工作者将GenAI作为学习工具而非解题工具。我们的研究强调了在将GenAI整合到编程教育中时,迫切需要指导以促进更深层次的理解。

英文摘要

The rise of Generative AI (GenAI) tools like ChatGPT has created new opportunities and challenges for computing education. Existing research has primarily focused on GenAI's ability to complete educational tasks and its impact on student performance, often overlooking its effects on knowledge gains. In this study, we investigate how GenAI assistance compares to conventional online resources in supporting knowledge gains across different proficiency levels. We conducted a controlled user experiment with 24 undergraduate students of two different levels of programming experience (beginner, intermediate) to examine how students interact with ChatGPT while solving programming tasks. We analyzed task performance, conceptual understanding, and interaction behaviors. Our findings reveal that generating complete solutions with GenAI significantly improves task performance, especially for beginners, but does not consistently result in knowledge gains. Importantly, usage strategies differ by experience: beginners tend to rely heavily on GenAI toward task completion often without knowledge gain in the process, while intermediates adopt more selective approaches. We find that both over-reliance and minimal use result in weaker knowledge gains overall. Based on our results, we call on students and educators to adopt GenAI as a learning rather than a problem solving tool. Our study highlights the urgent need for guidance when integrating GenAI into programming education to foster deeper understanding.

2511.04260 2026-06-12 cs.CV cs.AI 版本更新

Proto-LeakNet: Towards Signal-Leak Aware Attribution in Synthetic Human Face Imagery

Proto-LeakNet:面向合成人脸图像中信号泄漏感知的归因方法

Claudio Giusti, Luca Guarnera, Sebastiano Battiato

发表机构 * Department of Mathematics and Computer Science(数学与计算机科学系) University of Catania(卡塔尼亚大学)

AI总结 提出Proto-LeakNet,利用扩散模型中的信号泄漏痕迹,结合闭集分类与密度开集评估,实现可解释的生成器归因,在闭集上训练后对未见生成器也有效。

详情
Comments
44 pages, 27 figures, 11 tables
AI中文摘要

合成图像和深度伪造生成模型的日益复杂使得源归因和真实性验证成为现代计算机视觉系统的关键挑战。最近的研究表明,扩散管道会在其输出中无意中留下持久的统计痕迹,称为信号泄漏,特别是在潜在表示中。基于这一观察,我们提出了Proto-LeakNet,一个信号泄漏感知且可解释的归因框架,它将闭集分类与基于密度的开集评估相结合,对学习到的嵌入进行开集评估,从而无需重新训练即可分析未见过的生成器。我们的方法作用于扩散模型的潜在域,重新模拟部分前向扩散以暴露残留的生成器特定线索。一个时间注意力编码器聚合多步潜在特征,而一个特征加权原型头则结构化嵌入空间并实现透明的归因。仅在闭集数据上训练并达到98.13%的宏AUC,Proto-LeakNet学习到的潜在几何结构在后处理下保持鲁棒,超越了最先进的方法,并且在真实图像与已知生成器之间以及已知与未见生成器之间实现了强可分离性。代码库可在以下链接获取:this https URL。

英文摘要

The growing sophistication of synthetic image and deepfake generation models has turned source attribution and authenticity verification into a critical challenge for modern computer vision systems. Recent studies suggest that diffusion pipelines unintentionally imprint persistent statistical traces, known as signal-leaks, within their outputs, particularly in latent representations. Building on this observation, we propose Proto-LeakNet, a signal-leak-aware and interpretable attribution framework that integrates Closed-set classification with a density-based Open-set evaluation on the learned embeddings, enabling analysis of unseen generators without retraining. Acting in the latent domain of diffusion models, our method re-simulates partial forward diffusion to expose residual generator-specific cues. A temporal attention encoder aggregates multi-step latent features, while a feature-weighted prototype head structures the embedding space and enables transparent attribution. Trained solely on closed data and achieving a Macro AUC of 98.13\%, Proto-LeakNet learns a latent geometry that remains robust under post-processing, surpassing state-of-the-art methods, and achieves strong separability both between real images and known generators, and between known and unseen ones. The codebase is available at the following link: this https URL.

2511.11022 2026-06-12 cs.RO 版本更新

Miniature Testbed for Validating Multi-Agent Cooperative Autonomous Driving

用于验证多智能体协同自动驾驶的微型测试平台

Hyunchul Bae, Eunjae Lee, Jehyeop Han, Minhee Kang, Jaehyeon Kim, Junggeun Seo, Minkyun Noh, Heejin Ahn

发表机构 * School of Electrical Engineering(电气工程学院) School of Mechanical Engineering(机械工程学院) Korea Advanced Institute of Science and Technology(韩国科学技术院)

AI总结 提出CIVAT微型测试平台,集成V2V/V2I通信与ROS2框架,通过基础设施感知和交叉口管理实验验证协同自动驾驶功能。

详情
Comments
Accepted by ICRA 2026, 8 pages
AI中文摘要

协同自动驾驶通过实现车辆与智能路侧基础设施之间的实时协作来扩展车辆自主性,仍然是一个具有挑战性但至关重要的问题。然而,现有的测试平台均未采用配备感知、边缘计算和通信能力的智能基础设施。为填补这一空白,我们设计并实现了一个1:15比例的微型测试平台CIVAT,用于验证协同自动驾驶,该平台包括一个缩小的城市地图、配备车载传感器的自动驾驶车辆以及智能基础设施。所提出的测试平台通过共享Wi-Fi和ROS2框架,以发布-订阅模式集成V2V和V2I通信,实现车辆与基础设施之间的信息交换,从而达成协同驾驶功能。作为案例研究,我们通过基于基础设施的感知和交叉口管理实验验证了该系统。

英文摘要

Cooperative autonomous driving, which extends vehicle autonomy by enabling real-time collaboration between vehicles and smart roadside infrastructure, remains a challenging yet essential problem. However, none of the existing testbeds employ smart infrastructure equipped with sensing, edge computing, and communication capabilities. To address this gap, we design and implement a 1:15-scale miniature testbed, CIVAT, for validating cooperative autonomous driving, consisting of a scaled urban map, autonomous vehicles with onboard sensors, and smart infrastructure. The proposed testbed integrates V2V and V2I communication with the publish-subscribe pattern through a shared Wi-Fi and ROS2 framework, enabling information exchange between vehicles and infrastructure to realize cooperative driving functionality. As a case study, we validate the system through infrastructure-based perception and intersection management experiments.

2503.10919 2026-06-12 cs.RO eess.SY nlin.PS 版本更新

Data-Driven Soft Robot Control via Adiabatic Spectral Submanifolds

基于绝热谱子流形的数据驱动软体机器人控制

Roshan S. Kaundinya, John Irvin Alora, Jonas G. Matt, Luis A. Pabon, Marco Pavone, George Haller

发表机构 * Institute for Mechanical Systems, ETH Zürich(机械系统研究所,苏黎世联邦理工学院) Autonomous Systems Lab, Stanford University(自主系统实验室,斯坦福大学) Automatic Control Laboratory, ETH Zürich(自动控制实验室,苏黎世联邦理工学院)

AI总结 针对软体机器人在非线性区域控制难题,提出基于绝热谱子流形(aSSM)的模型预测控制策略,通过数据驱动构建低维吸引子流形,实现高精度轨迹跟踪,性能提升达10倍。

详情
Comments
41 pages, 24 figures, IJRR (2026) in press
AI中文摘要

软体机器人的机械复杂性给基于模型的控制带来了重大挑战。具体而言,线性数据驱动模型难以在探索具有显著非线性行为的复杂空间扩展路径上控制软体机器人。为了解释这些非线性,我们基于最新的绝热谱子流形(aSSM)理论开发了一种模型预测控制策略。该理论适用是因为重度阻尼机器人的内部振动衰减速度远快于机器人沿预定路径的期望速度。在这种情况下,低维吸引不变流形(aSSM)从路径发出并承载机器人的主导动力学。借助这一最新理论,我们仅从数据出发设计了一种基于aSSM的模型预测控制方案。我们展示了数据驱动模型在跨不同任务跟踪动态轨迹方面的有效性。我们在软体躯干机器人和基于Cosserat杆的弹性软臂的高保真、高维有限元模型上进行了验证,额外实验确认了即使在存在实验噪声的情况下也具有鲁棒性能。值得注意的是,我们发现五维或六维aSSM简化模型在所有闭环控制任务中的跟踪性能比其他数据驱动建模方法高出最多10倍。

英文摘要

The mechanical complexity of soft robots creates significant challenges for their model-based control. Specifically, linear data-driven models have struggled to control soft robots on complex, spatially extended paths that explore regions with significant nonlinear behavior. To account for these nonlinearities, we develop here a model-predictive control strategy based on the recent theory of adiabatic spectral submanifolds (aSSMs). This theory is applicable because the internal vibrations of heavily overdamped robots decay at a speed that is much faster than the desired speed of the robot along its intended path. In that case, low-dimensional attracting invariant manifolds (aSSMs) emanate from the path and carry the dominant dynamics of the robot. Aided by this recent theory, we devise an aSSM-based model-predictive control scheme purely from data. We demonstrate the effectiveness of our data-driven model in tracking dynamic trajectories across diverse tasks. We validate on high-fidelity, high-dimensional finite-element models of a soft trunk robot and Cosserat-rod-based elastic soft arms, with additional experiments confirming robust performance even in the presence of experimental noise. Notably, we find that five- or six-dimensional aSSM-reduced models outperform the tracking performance of other data-driven modeling methods by a factor up to 10 across all closed-loop control tasks.

2504.21561 2026-06-12 cs.CV 版本更新

Iterative Tool Usage Exploration for Multimodal Agents via Step-wise Preference Tuning

通过逐步偏好调优的多模态智能体迭代工具使用探索

Pengxiang Li, Zhi Gao, Bofei Zhang, Yapeng Mi, Xiaojian Ma, Chenrui Shi, Tao Yuan, Yuwei Wu, Yunde Jia, Song-Chun Zhu, Qing Li

发表机构 * Beijing Key Laboratory of Intelligent Information Technology, School of Computer Science & Technology, Beijing Institute of Technology(北京智能信息科技重点实验室,计算机科学与技术学院,北京理工大学) State Key Laboratory of General Artificial Intelligence, BIGAI(通用人工智能国家重点实验室,BIGAI) State Key Laboratory of General Artificial Intelligence, Peking University(通用人工智能国家重点实验室,北京大学) Harbin Institute of Technology(哈尔滨工业大学) Guangdong Laboratory of Machine Perception and Intelligent Computing, Shenzhen MSU-BIT University(广东机器感知与智能计算实验室,深圳MSU-BIT大学) Department of Automation, Tsinghua University(自动化系,清华大学)

AI总结 提出SPORT方法,通过任务合成、步骤采样、步骤验证和偏好调优的迭代循环,使多模态智能体无需预收集数据即可自主探索和优化工具使用策略,在GTA和GAIA基准上分别提升6.41%和3.64%。

详情
Comments
24 pages
AI中文摘要

多模态智能体将控制器(例如视觉语言模型)与外部工具集成,在解决复杂多模态任务方面展现了卓越的能力。现有训练这些智能体的方法,包括监督微调和强化学习,都依赖于大量人工标注的任务-答案对和工具轨迹。然而,对于复杂多模态任务,此类标注成本过高或难以实现。本文提出一种无需任何预收集数据的多模态智能体迭代工具使用探索方法,即SPORT,通过逐步偏好优化来改进工具使用轨迹。我们的方法使多模态智能体能够通过自我探索和优化自主发现有效的工具使用策略,消除了人工标注的瓶颈。SPORT包含四个迭代组件:任务合成、步骤采样、步骤验证和偏好调优。我们首先使用语言模型合成多模态任务。然后,我们引入一种新颖的轨迹探索方案,其中步骤采样和步骤验证交替执行以解决合成任务。在步骤采样中,智能体尝试不同的工具并获取相应结果。在步骤验证中,我们使用验证器提供AI反馈以构建逐步偏好数据。该数据随后通过偏好调优用于更新控制器的工具使用,生成SPORT智能体。通过与真实环境交互,SPORT智能体逐渐演化为更精细和更有能力的系统。在GTA和GAIA基准上的评估显示,SPORT智能体分别实现了6.41%和3.64%的提升,突显了我们方法的泛化性和有效性。项目页面见该URL。

英文摘要

Multimodal agents, which integrate a controller e.g., a vision language model) with external tools, have demonstrated remarkable capabilities in tackling complex multimodal tasks. Existing approaches for training these agents, both supervised fine-tuning and reinforcement learning, depend on extensive human-annotated task-answer pairs and tool trajectories. However, for complex multimodal tasks, such annotations are prohibitively expensive or impractical to obtain. In this paper, we propose an iterative tool usage exploration method for multimodal agents without any pre-collected data, namely SPORT, via step-wise preference optimization to refine the trajectories of tool usage. Our method enables multimodal agents to autonomously discover effective tool usage strategies through self-exploration and optimization, eliminating the bottleneck of human annotation. SPORT has four iterative components: task synthesis, step sampling, step verification, and preference tuning. We first synthesize multimodal tasks using language models. Then, we introduce a novel trajectory exploration scheme, where step sampling and step verification are executed alternately to solve synthesized tasks. In step sampling, the agent tries different tools and obtains corresponding results. In step verification, we employ a verifier to provide AI feedback to construct step-wise preference data. The data is subsequently used to update the controller for tool usage through preference tuning, producing a SPORT agent. By interacting with real environments, the SPORT agent gradually evolves into a more refined and capable system. Evaluation in the GTA and GAIA benchmarks shows that the SPORT agent achieves 6.41% and 3.64% improvements, underscoring the generalization and effectiveness introduced by our method. The project page is this https URL.

2510.16380 2026-06-12 cs.CL cs.AI cs.CY cs.HC cs.LG 版本更新

MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More than Outcomes

MoReBench:评估语言模型中的程序性和多元道德推理,超越结果

Yu Ying Chiu, Michael S. Lee, Rachel Calcott, Brandon Handoko, Paul de Font-Reaulx, Raphaël Millière, Paula Rodriguez, Chen Bo Calvin Zhang, Ziwen Han, Udari Madhushani Sehwag, Yash Maurya, Christina Q Knight, Harry R. Lloyd, Florence Bacus, Conor Downey, Mantas Mazeika, Bing Liu, Yejin Choi, Mitchell L Gordon, Sydney Levine

发表机构 * University of Washington(华盛顿大学) New York University(纽约大学) Scale AI Harvard University(哈佛大学) University of Michigan(密歇根大学) UNC Chapel Hill(北卡罗来纳大学教堂山分校) Center for AI Safety(人工智能安全中心) Stanford University(斯坦福大学) MIT(麻省理工学院) University of Oxford(牛津大学)

AI总结 提出MoReBench基准,包含1000个道德场景和超过2.3万条标准,用于评估语言模型在道德推理中的程序性推理能力,发现现有基准无法预测模型表现,且模型对特定道德框架存在偏好。

详情
Comments
46 pages, 8 figures, 10 tables. Published in ICLR 2026. Accepted at CHAI workshop and SPP 2026 (non-archival)
AI中文摘要

随着人工智能系统的进步,我们越来越依赖它们与我们共同或代替我们做出决策。为了确保这些决策符合人类价值观,我们不仅需要理解它们做出了什么决策,还需要理解它们如何得出这些决策。推理语言模型能够提供最终响应和(部分透明的)中间思考轨迹,这为研究AI的程序性推理提供了及时的机会。与通常有客观正确答案的数学和代码问题不同,道德困境是过程导向评估的绝佳测试平台,因为它们允许多种可辩护的结论。为此,我们提出了MoReBench:包含1000个道德场景,每个场景配有一组专家认为在推理该场景时必须包含(或避免)的评分标准。MoReBench包含超过2.3万条标准,包括识别道德考量、权衡利弊以及给出可操作的建议,覆盖了AI为人类道德决策提供建议以及自主做出道德决策的情况。此外,我们整理了MoReBench-Theory:150个示例,用于测试AI是否能在规范伦理学的五个主要框架下进行推理。我们的结果表明,规模定律以及现有的数学、代码和科学推理任务基准无法预测模型进行道德推理的能力。模型还显示出对特定道德框架(例如边沁式的行为功利主义和康德义务论)的偏好,这可能是流行训练范式的副作用。这些基准共同推动了面向过程推理的评估,以实现更安全、更透明的AI。

英文摘要

As AI systems progress, we rely more on them to make decisions with us and for us. To ensure that such decisions are aligned with human values, it is imperative for us to understand not only what decisions they make but also how they come to those decisions. Reasoning language models, which provide both final responses and (partially transparent) intermediate thinking traces, present a timely opportunity to study AI procedural reasoning. Unlike math and code problems which often have objectively correct answers, moral dilemmas are an excellent testbed for process-focused evaluation because they allow for multiple defensible conclusions. To do so, we present MoReBench: 1,000 moral scenarios, each paired with a set of rubric criteria that experts consider essential to include (or avoid) when reasoning about the scenarios. MoReBench contains over 23 thousand criteria including identifying moral considerations, weighing trade-offs, and giving actionable recommendations to cover cases on AI advising humans moral decisions as well as making moral decisions autonomously. Separately, we curate MoReBench-Theory: 150 examples to test whether AI can reason under five major frameworks in normative ethics. Our results show that scaling laws and existing benchmarks on math, code, and scientific reasoning tasks fail to predict models' abilities to perform moral reasoning. Models also show partiality towards specific moral frameworks (e.g., Benthamite Act Utilitarianism and Kantian Deontology), which might be side effects of popular training paradigms. Together, these benchmarks advance process-focused reasoning evaluation towards safer and more transparent AI.

2510.16311 2026-06-12 cs.LG 版本更新

Toward General Digraph Contrastive Learning: A Dual Spatial Perspective

面向一般有向图对比学习:双空间视角

Zhengyu Wu, Daohan Su, Yang Zhang, Xunkai Li, Rong-Hua Li, Guoren Wang

发表机构 * National University of Singapore(新加坡国立大学) University of Science and Technology of China(中国科学技术大学)

AI总结 提出S2-DiGCL框架,从复数域和实数域双空间视角对有向图进行对比学习,通过磁拉普拉斯自适应调制和路径子图增强,在节点分类和链接预测任务上分别提升4.41%和4.34%。

详情
AI中文摘要

图对比学习(GCL)已成为一种从图中提取一致表示而无需标签信息的强大工具。然而,现有方法主要关注无向图,忽略了在实际网络(如社交网络和推荐系统)中基础且不可或缺的关键方向信息。本文提出了S2-DiGCL,一种新颖的框架,强调从复杂域和实数域视角对有向图进行对比学习的空间洞察。从复数域视角,S2-DiGCL在磁拉普拉斯中引入个性化扰动,以自适应地调制边相位和方向语义。从实数域视角,它采用基于路径的子图增强策略,捕捉细粒度的局部不对称性和拓扑依赖性。通过联合利用这两个互补的空间视图,S2-DiGCL构建了高质量的正负样本,从而实现更通用和鲁棒的有向图对比学习。在7个真实有向图数据集上的大量实验证明了我们方法的优越性,在监督和无监督设置下,节点分类和链接预测分别实现了4.41%和4.34%的性能提升,达到了最先进水平。

英文摘要

Graph Contrastive Learning (GCL) has emerged as a powerful tool for extracting consistent representations from graphs, independent of labeled information. However, existing methods predominantly focus on undirected graphs, disregarding the pivotal directional information that is fundamental and indispensable in real-world networks (e.g., social networks and recommendations).In this paper, we introduce S2-DiGCL, a novel framework that emphasizes spatial insights from complex and real domain perspectives for directed graph (digraph) contrastive learning. From the complex-domain perspective, S2-DiGCL introduces personalized perturbations into the magnetic Laplacian to adaptively modulate edge phases and directional semantics. From the real-domain perspective, it employs a path-based subgraph augmentation strategy to capture fine-grained local asymmetries and topological dependencies. By jointly leveraging these two complementary spatial views, S2-DiGCL constructs high-quality positive and negative samples, leading to more general and robust digraph contrastive learning. Extensive experiments on 7 real-world digraph datasets demonstrate the superiority of our approach, achieving SOTA performance with 4.41% improvement in node classification and 4.34% in link prediction under both supervised and unsupervised settings.