arXivDaily arXiv每日学术速递 周一至周五更新
重置
2602.09730 2026-06-12 cs.CV cs.LG cs.NA math.NA 版本更新

Allure of Craquelure: A Variational-Generative Approach to Crack Detection in Paintings

龟裂的魅力:一种变分-生成式绘画裂纹检测方法

Laura Paul, Holger Rauhut, Martin Burger, Samira Kabri, Tim Roith

发表机构 * Dept. of Mathematics, LMU Munich(数学系,慕尼黑大学) Munich Center for Machine Learning(慕尼黑机器学习中心) Helmholtz Imaging, Deutsches Elektronen-Synchrotron DESY(海德堡影像,德意志电子同步辐射光源) Fachbereich Mathematik, University of Hamburg(数学学院,汉堡大学) CIT School, Technical University of Munich(技术大学慕尼黑信息学院)

AI总结 提出混合方法,将裂纹检测建模为逆问题,用深度生成模型作为画作先验,结合Mumford-Shah变分泛函和裂纹先验,通过联合优化获得像素级裂纹定位图。

详情
AI中文摘要

近期成像技术、深度学习与数值性能的进步使得对艺术品的非侵入性详细分析成为可能,支持其记录与保护。特别是,数字化绘画中龟裂的自动检测对于评估退化和指导修复至关重要,但由于可能复杂的场景以及裂纹与类似裂纹的艺术特征(如笔触或毛发)之间的视觉相似性,这仍然具有挑战性。我们提出一种混合方法,将裂纹检测建模为一个逆问题,将观测图像分解为无裂纹绘画和裂纹分量。采用深度生成模型作为底层艺术品的有力先验,同时使用Mumford-Shah型变分泛函结合裂纹先验来捕捉裂纹结构。联合优化得到绘画中裂纹定位的像素级图。

英文摘要

Recent advances in imaging technologies, deep learning and numerical performance have enabled non-invasive detailed analysis of artworks, supporting their documentation and conservation. In particular, automated detection of craquelure in digitized paintings is crucial for assessing degradation and guiding restoration, yet remains challenging due to the possibly complex scenery and the visual similarity between cracks and crack-like artistic features such as brush strokes or hair. We propose a hybrid approach that models crack detection as an inverse problem, decomposing an observed image into a crack-free painting and a crack component. A deep generative model is employed as powerful prior for the underlying artwork, while crack structures are captured using a Mumford--Shah-type variational functional together with a crack prior. Joint optimization yields a pixel-level map of crack localizations in the painting.

2602.07106 2026-06-12 cs.CV cs.AI cs.CL 版本更新

Ex-Omni: Enabling 3D Facial Animation Generation for Omni-modal Large Language Models

Ex-Omni:为全模态大语言模型赋能3D面部动画生成

Haoyu Zhang, Zhipeng Li, Yiwen Guo, Tianshu Yu

发表机构 * The Chinese University of Hong Kong, Shenzhen(香港中文大学(深圳)) LIGHTSPEED Independent Researcher(独立研究员)

AI总结 提出Ex-Omni模型,通过混合形状感知语音单元生成器和解码器解耦语义推理与时间生成,并引入统一令牌查询门控融合机制,实现全模态大语言模型同步生成语音和3D面部动画。

详情
AI中文摘要

全模态大语言模型旨在统一多模态理解和生成,然而,尽管自然的人机交互至关重要,但扩展它们以联合生成语音和3D面部动画仍 largely unexplored。一个关键挑战是LLM的离散语义推理与3D面部运动所需的密集时间动态之间的不匹配。我们提出Expressive Omni (Ex-Omni),一个开源模型,通过原生语音伴随的3D面部动画增强OLLM。Ex-Omni通过混合形状感知语音单元生成器和混合形状解码器将语义推理与时间生成解耦,其中语音单元提供时间支架,隐藏语音表示携带面部相关线索。我们进一步引入统一的令牌查询门控融合机制用于受控语义注入,以及InstructS2SF-1200K,一个包含1200K样本的预训练数据集。大量实验表明,Ex-Omni在保持竞争性语音理解和生成能力的同时,实现了比级联管道更好的音视频同步和更低的面部生成延迟。

英文摘要

Omni-modal large language models (OLLMs) aim to unify multimodal understanding and generation, yet extending them to jointly produce speech and 3D facial animation remains largely unexplored despite its importance for natural human-computer interaction. A key challenge is the mismatch between the discrete semantic reasoning of LLMs and the dense temporal dynamics required for 3D facial motion. We propose Expressive Omni (Ex-Omni), an open-source model that augments OLLMs with native speech-accompanied 3D facial animation. Ex-Omni decouples semantic reasoning from temporal generation through a blendshape-aware speech unit generator and a blendshape decoder, where speech units provide temporal scaffolding and hidden speech representations carry facially relevant cues. We further introduce a unified token-as-query gated fusion (TQGF) mechanism for controlled semantic injection, as well as InstructS2SF-1200K, a dataset consisting of 1200K samples for pre-training. Extensive experiments show that Ex-Omni maintains competitive speech understanding and generation ability while achieving better audio-visual synchronization and lower face-generation latency than cascaded pipelines.

2602.04675 2026-06-12 cs.LG 版本更新

Generalized Schrödinger Bridge on Graphs

图上的广义薛定谔桥

Panagiotis Theodoropoulos, Juno Nam, Evangelos Theodorou, Jaemoo Choi

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出GSBoG框架,通过似然优化学习图上可控连续时间马尔可夫链策略,满足端点边际分布并优化中间状态成本,实现可扩展的拓扑感知运输。

详情
AI中文摘要

图上的运输是许多领域中的一个基本挑战,决策必须尊重拓扑和操作约束。尽管需要可执行的策略,现有的图运输方法缺乏这种表达能力。它们依赖于限制性假设,无法在稀疏拓扑上泛化,并且随着图大小和时间范围的增加而扩展性差。为了解决这些问题,我们引入了图上的广义薛定谔桥(GSBoG),这是一种新颖的可扩展数据驱动框架,用于在状态成本增强动力学下学习任意图上的可执行受控连续时间马尔可夫链(CTMC)策略。值得注意的是,GSBoG学习轨迹级策略,避免了密集的全局求解器,从而增强了可扩展性。这是通过似然优化方法实现的,满足端点边际分布,同时优化状态依赖运行成本下的中间行为。在具有挑战性的真实世界图拓扑上的大量实验表明,GSBoG能够可靠地学习准确、尊重拓扑的策略,同时优化特定应用的中间状态成本,突出了其广泛的适用性,并为一般图上的成本感知动态运输开辟了新途径。

英文摘要

Transportation on graphs is a fundamental challenge across many domains, where decisions must respect topological and operational constraints. Despite the need for actionable policies, existing graph-transport methods lack this expressivity. They rely on restrictive assumptions, fail to generalize across sparse topologies, and scale poorly with graph size and time horizon. To address these issues, we introduce Generalized Schrödinger Bridge on Graphs (GSBoG), a novel scalable data-driven framework for learning executable controlled continuous-time Markov chain (CTMC) policies on arbitrary graphs under state cost augmented dynamics. Notably, GSBoG learns trajectory-level policies, avoiding dense global solvers and thereby enhancing scalability. This is achieved via a likelihood optimization approach, satisfying the endpoint marginals, while simultaneously optimizing intermediate behavior under state-dependent running costs. Extensive experimentation on challenging real-world graph topologies shows that GSBoG reliably learns accurate, topology-respecting policies while optimizing application-specific intermediate state costs, highlighting its broad applicability and paving new avenues for cost-aware dynamical transport on general graphs.

2601.09693 2026-06-12 cs.LG stat.ML 版本更新

Contrastive Geometric Learning Unlocks Unified Structure- and Ligand-Based Drug Design

对比几何学习实现统一的结构与配体药物设计

Lisa Schneckenreiter, Sohvi Luukkonen, Lukas Friedrich, Daniel Kuhn, Günter Klambauer

发表机构 * DeepMind Ltd(DeepMind有限公司)

AI总结 提出对比几何模型ConGLUDe,统一结构与配体训练,实现虚拟筛选、靶标钓鱼和配体条件口袋预测,在多项基准测试中表现优异。

详情
Comments
Forty-Third International Conference on Machine Learning
AI中文摘要

基于结构和基于配体的计算药物设计传统上依赖于不相关的数据源和建模假设,限制了它们在大规模上的联合使用。在这项工作中,我们引入了用于统一计算药物设计的对比几何学习(ConGLUDe),这是一个单一的对比几何模型,统一了基于结构和基于配体的训练。ConGLUDe将产生全蛋白质表示和预测结合位点的隐式嵌入的几何蛋白质编码器与快速配体编码器耦合,消除了对预定义口袋的需求。通过对比学习将配体与全局蛋白质表示和多个候选结合位点对齐,ConGLUDe除了支持虚拟筛选和靶标钓鱼外,还支持配体条件口袋预测,同时在蛋白质-配体复合物和大规模生物活性数据上联合训练。在多种基准测试中,ConGLUDe实现了具有竞争力的零样本虚拟筛选性能,在具有挑战性的靶标钓鱼任务上显著优于现有方法,并展示了最先进的配体条件口袋选择。这些结果突显了统一结构-配体训练的优势,并将ConGLUDe定位为迈向药物发现通用基础模型的一步。

英文摘要

Structure-based and ligand-based computational drug design have traditionally relied on disjoint data sources and modeling assumptions, limiting their joint use at scale. In this work, we introduce Contrastive Geometric Learning for Unified Computational Drug Design (ConGLUDe), a single contrastive geometric model that unifies structure- and ligand-based training. ConGLUDe couples a geometric protein encoder that produces whole-protein representations and implicit embeddings of predicted binding sites with a fast ligand encoder, removing the need for predefined pockets. By aligning ligands with both global protein representations and multiple candidate binding sites through contrastive learning, ConGLUDe supports ligand-conditioned pocket prediction in addition to virtual screening and target fishing, while being trained jointly on protein-ligand complexes and large-scale bioactivity data. Across diverse benchmarks, ConGLUDe achieves competitive zero-shot virtual screening performance, substantially outperforms existing methods on a challenging target fishing task, and demonstrates state-of-the-art ligand-conditioned pocket selection. These results highlight the advantages of unified structure-ligand training and position ConGLUDe as a step toward general-purpose foundation models for drug discovery.

2505.13102 2026-06-12 cs.LG cs.AI eess.SP 版本更新

Lightweight and Interpretable Transformer via Mixed Graph Algorithm Unrolling for Traffic Forecast

轻量级可解释Transformer:基于混合图算法展开的交通预测

Ji Qi, Tam Thuc Do, Mingxiao Liu, Zhuoshi Pan, Yuzhe Li, Gene Cheung, H. Vicky Zhao

发表机构 * arXiv.org University of Science and Technology of China(中国科学技术大学)

AI总结 提出一种通过展开混合图优化算法构建的轻量级可解释类Transformer网络,用于时空交通预测,在保持竞争性能的同时大幅减少参数。

详情
Comments
24 pages, 7 figures, 11 tables
AI中文摘要

与采用经典自注意力机制的传统“黑箱”Transformer不同,我们通过展开基于混合图的优化算法构建了一个轻量级且可解释的类Transformer神经网络,用于具有空间和时间维度的交通预测。我们构建了两个图:一个无向图$\mathcal{G}^u$捕捉跨地理的空间相关性,以及一个有向图$\mathcal{G}^d$捕捉时间上的序列关系。我们预测信号$\mathbf{x}$的未来样本,假设其相对于$\mathcal{G}^u$和$\mathcal{G}^d$都是“平滑的”,为此我们设计了新的$\ell_2$和$\ell_1$范数变分项来量化并促进有向图上的信号平滑性(低频重构)。我们基于交替方向乘子法(ADMM)设计了一个迭代算法,并将其展开为一个前馈网络以进行数据驱动的参数学习。我们周期性地插入用于$\mathcal{G}^u$和$\mathcal{G}^d$的图学习模块,这些模块扮演自注意力的角色。实验表明,我们的展开网络在交通预测性能上与最先进的预测方案相当,同时大幅减少了参数数量。

英文摘要

Unlike conventional "black-box" transformers with classical self-attention mechanism, we build a lightweight and interpretable transformer-like neural net by unrolling a mixed-graph-based optimization algorithm to forecast traffic with spatial and temporal dimensions. We construct two graphs: an undirected graph $\mathcal{G}^u$ capturing spatial correlations across geography, and a directed graph $\mathcal{G}^d$ capturing sequential relationships over time. We predict future samples of signal $\mathbf{x}$, assuming it is "smooth" with respect to both $\mathcal{G}^u$ and $\mathcal{G}^d$, where we design new $\ell_2$ and $\ell_1$-norm variational terms to quantify and promote signal smoothness (low-frequency reconstruction) on a directed graph. We design an iterative algorithm based on alternating direction method of multipliers (ADMM), and unroll it into a feed-forward network for data-driven parameter learning. We periodically insert graph learning modules for $\mathcal{G}^u$ and $\mathcal{G}^d$ that play the role of self-attention. Experiments show that our unrolled networks achieve competitive traffic forecast performance as state-of-the-art prediction schemes, while reducing parameter counts drastically.

2602.02181 2026-06-12 cs.RO 版本更新

Extending the Law of Intersegmental Coordination: Implications for Powered Prosthetic Controls

扩展节段间协调定律:对动力假肢控制的启示

Elad Siman Tov, Nili E. Krausz

发表机构 * Faculty of Mechanical Engineering, Technion – Israel Institute of Technology(机械工程系,技术学院–以色列理工学院)

AI总结 针对下肢截肢者步行代谢成本问题,提出基于节段间协调定律的假肢控制框架,通过分析三维运动学数据扩展出力矩协调定律,并开发了开源工具包。

详情
Comments
Submitted to 2026 IEEE International Conference on Biomedical Robotics and Biomechatronics (BioRob)
AI中文摘要

动力假肢能够为截肢者提供净正功,并在过去二十年中取得了进步。然而,降低截肢者步行代谢成本仍是一个未解决的问题。节段间协调定律(ISC)已在多种步态中被观察到,并先前被认为与步行能量消耗有关,但很少在下肢截肢者步态背景下进行分析或应用。该定律指出,大腿、小腿和足部在步态周期中的仰角是协变的。在这项工作中,我们开发了一种方法,用于分析下肢三维运动学数据的节段间协调,以简化ISC分析。此外,受运动控制、生物力学和机器人学文献的启发,我们使用该方法将ISC扩展为一种新的力矩协调定律。我们发现了这些仰角空间力矩(ESM),并展示了显示健全步态基于力矩的协调的结果。我们还分析了使用动力和被动假肢的截肢者步态的ISC,发现虽然仰角保持平面性,但ESM缺乏平面协调。我们提出了一个ISC驱动的动力假肢控制框架,使用健康协调作为约束来预测小腿角度/力矩,以补偿由于被动足部引起的改变。我们开发了ISC3d工具箱,该工具可在线免费获取,可用于计算三维运动学和动力学ISC。这为进一步研究协调在步态中的作用提供了手段,并可能有助于解决人类运动神经控制的基本问题。

英文摘要

Powered prostheses are capable of providing net positive work to amputees and have advanced in the past two decades. However, reducing amputee metabolic cost of walking remains an open problem. The Law of Intersegmental Coordination (ISC) has been observed across gaits and previously implicated in energy expenditure of walking, yet it has rarely been analyzed or applied within the context of lower-limb amputee gait. This law states that the elevation angles of the thigh, shank and foot over the gait cycle covary. In this work, we developed a method to analyze intersegmental coordination for lower-limb 3D kinematic data, to simplify ISC analysis. Moreover, inspired by motor control, biomechanics and robotics literature, we used our method to extend ISC to a new law of coordination of moments. We find these Elevation Space Moments (ESM), and present results showing a moment-based coordination for able bodied gait. We also analyzed ISC for amputee gait with powered and passive prostheses, and found that while elevation angles remained planar, the ESM lacked planar coordination. We present an ISC-driven powered prosthetic control framework, using healthy coordination as a constraint to predict the shank angles/moments to compensate for alterations due to a passive foot. We developed the ISC3d toolbox that is freely available online, which may be used to compute kinematic and kinetic ISC in 3D. This provides a means to further study the role of coordination in gait and may help address fundamental questions of the neural control of human movement.

2602.01572 2026-06-12 cs.CL cs.IR 版本更新

LLM-based Embeddings: Attention Values Encode Sentence Semantics Better Than Hidden States

基于LLM的嵌入:注意力值比隐藏状态更好地编码句子语义

Yeqin Zhang, Yunfei Wang, Jiaxuan Chen, Ke Qin, Yizheng Zhao, Cam-Tu Nguyen

发表机构 * arXiv.org cs.CL(计算机语言学)

AI总结 本文提出Value Aggregation方法,利用LLM的注意力值向量而非隐藏状态来生成句子嵌入,在无训练设置下超越现有方法,甚至匹配或超越集成方法MetaEOL。

详情
AI中文摘要

句子表示是许多自然语言处理(NLP)应用的基础。虽然近期方法利用大型语言模型(LLM)来推导句子表示,但大多数依赖于最终层的隐藏状态,这些隐藏状态针对下一个词预测进行了优化,因此通常无法捕捉全局的句子级语义。本文引入了一个新颖的视角,证明注意力值向量比隐藏状态更有效地捕捉句子语义。我们提出了值聚合(VA),一种简单的方法,它跨多个层和词索引池化标记值。在无训练设置中,VA优于其他基于LLM的嵌入,甚至匹配或超越了基于集成的MetaEOL。此外,我们证明,当与合适的提示配对时,层注意力输出可以被解释为对齐的加权值向量。具体来说,最后一个标记的注意力分数充当权重,而输出投影矩阵($W_O$)将这些加权值向量与LLM残差流的公共空间对齐。这种改进的方法,称为对齐加权VA(AlignedWVA),在无训练的基于LLM的嵌入中达到了最先进的性能,大幅超越了高成本的MetaEOL。最后,我们强调了通过微调值聚合来获得强LLM嵌入模型的潜力。

英文摘要

Sentence representations are foundational to many Natural Language Processing (NLP) applications. While recent methods leverage Large Language Models (LLMs) to derive sentence representations, most rely on final-layer hidden states, which are optimized for next-token prediction and thus often fail to capture global, sentence-level semantics. This paper introduces a novel perspective, demonstrating that attention value vectors capture sentence semantics more effectively than hidden states. We propose Value Aggregation (VA), a simple method that pools token values across multiple layers and token indices. In a training-free setting, VA outperforms other LLM-based embeddings, even matches or surpasses the ensemble-based MetaEOL. Furthermore, we demonstrate that when paired with suitable prompts, the layer attention outputs can be interpreted as aligned weighted value vectors. Specifically, the attention scores of the last token function as the weights, while the output projection matrix ($W_O$) aligns these weighted value vectors with the common space of the LLM residual stream. This refined method, termed Aligned Weighted VA (AlignedWVA), achieves state-of-the-art performance among training-free LLM-based embeddings, outperforming the high-cost MetaEOL by a substantial margin. Finally, we highlight the potential of obtaining strong LLM embedding models through fine-tuning Value Aggregation.

2512.12571 2026-06-12 cs.CV 版本更新

Measurement Plasticity: Sensor-Level Adaptation for Vision-Language Models

测量塑性:面向视觉-语言模型的传感器级自适应

Boyeong Im, Wooseok Lee, Yoojin Kwon, Hyung-Sin Kim

发表机构 * arXiv.org University of Seoul(首尔大学)

AI总结 提出多视角物理提示(MVP)用于测试时自适应,通过将相机曝光三角(ISO、快门速度、光圈)作为物理提示,在传感器层面进行自适应,无需梯度或模型修改,在ImageNet-ES上优于数字方法。

详情
Comments
Accepted to the ICML 2026 Workshop on Continual Adaptation at Scale
AI中文摘要

我们提出用于测试时自适应(TTA)的多视角物理提示(MVP),这是一种前向传播框架,通过将相机曝光三角(即ISO、快门速度、光圈)视为物理提示,将TTA从令牌层面转移到光子层面。在推理时,MVP使用源亲和度得分获取选定的多个物理视角,评估每个保留视角的数字增强变体并过滤最低熵预测,然后通过硬投票聚合预测。这种先选择后投票的设计简单、易于校准,且无需梯度或模型修改。在ImageNet-ES和ImageNet-ES-Diverse上,MVP在自动曝光以及与传统传感器控制结合的情况下均优于纯数字TTA。在减少参数候选以降低捕获延迟的情况下,MVP仍然有效,展示了其实用性。

英文摘要

We propose Multi-View Physical-prompt (MVP) for Test-Time Adaptation (TTA), a forward-only framework that moves TTA from tokens to photons by treating the camera exposure triangle (i.e., ISO, shutter speed, and aperture) as physical prompts. At inference, MVP acquires selected multiple physical views using a source-affinity score, evaluates digitally augmented variants of each retained view and filters the lowest-entropy predictions, and aggregates predictions with hard voting. This selection-then-vote design is simple, calibration-friendly, and requires no gradients or model modifications. On ImageNet-ES and ImageNet-ES-Diverse, MVP outperforms digital-only TTA on both Auto-Exposure and a combination with conventional sensor control. MVP remains effective under reduced parameter candidates that lower capture latency, demonstrating its practicality.

2601.22594 2026-06-12 cs.CL cs.AI 版本更新

Language Model Circuits Are Sparse in the Neuron Basis

语言模型电路在神经元基上是稀疏的

Aryaman Arora, Zhengxuan Wu, Jacob Steinhardt, Sarah Schwettmann

发表机构 * Stanford University(斯坦福大学)

AI总结 本文实证发现MLP神经元与稀疏自编码器一样是稀疏特征基,并基于此开发了端到端梯度归因流水线,在多项任务中揭示了因果有效的神经元电路。

详情
Comments
ICML Spotlight, camera-ready
AI中文摘要

神经网络用于计算的高层概念不一定与单个神经元对齐(Smolensky, 1986)。因此,语言模型可解释性研究转向了将神经元基分解为更可解释的模型计算单元的技术,例如稀疏自编码器(SAEs)。然而,并非所有基于神经元的表示都不可解释。我们首次实证表明,MLP神经元与SAEs一样是稀疏的特征基。利用这一发现,我们开发了一个端到端的基于梯度的归因流水线,用于在MLP神经元基上进行电路追踪,从而在多种任务中揭示因果有效的神经元。在标准的主谓一致基准测试(Marks et al., 2025)上,约$10^2$个MLP神经元的电路足以控制模型行为。在(Lindsey et al., 2025)的多跳城市-州-首都任务中,我们发现了一个电路,其中小部分神经元编码特定的潜在推理步骤(例如将城市映射到其所在州),并且可以通过引导来改变模型的输出。因此,这项工作在不增加额外训练成本的情况下推进了语言模型的自动化可解释性。

英文摘要

The high-level concepts that a neural network uses to perform computation need not be aligned to individual neurons (Smolensky, 1986). Language model interpretability research has thus turned to techniques which decompose the neuron basis into more interpretable units of model computation, such as sparse autoencoders (SAEs). However, not all neuron-based representations are uninterpretable. For the first time, we empirically show that MLP neurons are as sparse a feature basis as SAEs. We use this finding to develop an end-to-end gradient-based attribution pipeline for circuit tracing on the MLP neuron basis, which surfaces causally effective neurons on a variety of tasks. On a standard subject-verb agreement benchmark (Marks et al., 2025), a circuit of $\approx 10^2$ MLP neurons is enough to control model behaviour. On the multi-hop city-state-capital task from (Lindsey et al., 2025), we find a circuit in which small sets of neurons encode specific latent reasoning steps (e.g. mapping a city to its state), and can be steered to change the model's output. This work thus advances automated interpretability of language models without imposing additional training costs.

2601.22090 2026-06-12 cs.RO 版本更新

ReactEMG Stroke: Healthy-to-Stroke Few-shot Adaptation for sEMG-Based Intent Detection

ReactEMG 中风:基于表面肌电图的意图检测的健康到中风少样本适应

Runsheng Wang, Katelyn Lee, Xinyue Zhu, Lauren Winterbottom, Dawn M. Nilsen, Joel Stein, Matei Ciocarlie

发表机构 * Department of Mechanical Engineering, Columbia University in the City of New York(哥伦比亚大学纽约市机械工程系) Department of Computer Science, Columbia University in the City of New York(哥伦比亚大学纽约市计算机科学系) Department of Rehabilitation and Regenerative Medicine, Columbia University Irving Medical Center(哥伦比亚大学伊文思医疗中心康复与再生医学系)

AI总结 提出一种健康到中风的适应流程,利用大规模健康受试者sEMG预训练模型,仅用少量中风患者数据微调,显著提升意图检测准确率和鲁棒性。

详情
AI中文摘要

表面肌电图(sEMG)是一种有前景的控制信号,用于中风后按需辅助手部康复,但从瘫痪肌肉检测意图通常需要长时间、特定于受试者的校准,并且对变异性很脆弱。我们提出了一种健康到中风的适应流程,该流程从在大规模健全受试者sEMG上预训练的模型初始化意图检测器,然后仅使用少量特定于受试者的数据为每个中风参与者进行微调。使用从三位慢性中风患者收集的新数据集,我们比较了适应策略(仅头部调优、参数高效的LoRA适配器和全端到端微调),并在包含现实分布偏移(如会话内漂移、姿势变化和臂带重新定位)的保留测试集上评估。在各种条件下,与相同数据预算下的零样本迁移和仅中风训练相比,健康预训练适应一致地改善了中风意图检测;最佳适应方法将平均转换准确率从0.42提高到0.61,原始准确率从0.69提高到0.78。这些结果表明,迁移可复用的健康域EMG表示可以减少校准负担,同时提高实时中风后意图检测的鲁棒性。我们的项目网站、视频、代码和数据集可在以下网址获取:this https URL。

英文摘要

Surface electromyography (sEMG) is a promising control signal for assist-as-needed hand rehabilitation after stroke, but detecting intent from paretic muscles often requires lengthy, subject-specific calibration and remains brittle to variability. We propose a healthy-to-stroke adaptation pipeline that initializes an intent detector from a model pretrained on large-scale able-bodied sEMG, then fine-tunes it for each stroke participant using only a small amount of subject-specific data. Using a newly collected dataset from three individuals with chronic stroke, we compare adaptation strategies (head-only tuning, parameter-efficient LoRA adapters, and full end-to-end fine-tuning) and evaluate on held-out test sets that include realistic distribution shifts such as within-session drift, posture changes, and armband repositioning. Across conditions, healthy-pretrained adaptation consistently improves stroke intent detection relative to both zero-shot transfer and stroke-only training under the same data budget; the best adaptation methods improve average transition accuracy from 0.42 to 0.61 and raw accuracy from 0.69 to 0.78. These results suggest that transferring a reusable healthy-domain EMG representation can reduce calibration burden while improving robustness for real-time post-stroke intent detection. Our project website, video, code, and dataset are available at: https://roamlab.github.io/reactemg-stroke/.

2601.21570 2026-06-12 cs.AI cs.RO 版本更新

From Digital to Physical: Digital Agents as Autonomous Coaches for Physical Intelligence

从数字到物理:数字代理作为物理智能的自主教练

Zixing Lei, Genjia Liu, Yuanshuo Zhang, Qipeng Liu, Yuzhu Cai, Sixiang Chen, Jixian Wu, Yunhong Wang, Weixin Li, Chuan Wen, Bo Zhao, Shanghang Zhang, Wenzhao Lian, Siheng Chen

发表机构 * School of Artificial Intelligence, Shanghai Jiao Tong University, Shanghai, China(上海交通大学人工智能学院) Zhongguancun Academy, Beijing, China(中关村学院) School of Integrated Circuits, Shanghai Jiao Tong University, Shanghai, China(上海交通大学集成电路学院) School of Computer Science, Shanghai Jiao Tong University, Shanghai, China(上海交通大学计算机科学学院) State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University, Beijing, China(北京大学计算机科学学院多媒体信息处理国家重点实验室)

AI总结 提出EmboCoach-Bench基准,评估LLM代理自主设计具身策略的能力,通过迭代调试和优化,代理在平均成功率上超越人工基线26.5%,并具备自我修正能力。

详情
Comments
53 pages, 12 figures
AI中文摘要

具身AI领域正朝着通用机器人系统快速发展,得益于高保真模拟和大规模数据收集。然而,这种扩展能力仍然受到劳动密集型人工监督的严重瓶颈,从复杂的奖励塑造到跨异构后端的超参数调整。受LLM在软件自动化和科学发现中成功的启发,我们引入了\ extsc{EmboCoach-Bench},一个评估LLM代理自主设计具身策略能力的基准。涵盖32个专家精选的RL和IL任务,我们的框架将可执行代码作为通用接口。我们超越静态生成,评估动态闭环工作流,其中代理利用环境反馈迭代地起草、调试和优化解决方案,涵盖从物理信息奖励设计到扩散策略等策略架构的改进。广泛评估得出三个关键见解:(1)自主代理在平均成功率上可以定性超越人工设计的基线26.5%;(2)具有环境反馈的代理工作流有效增强了策略开发,并显著缩小了开源和专有模型之间的性能差距;(3)代理对病态工程案例表现出自我修正能力,通过迭代仿真循环调试成功从近乎完全失败中恢复任务性能。最终,这项工作为自我进化的具身智能奠定了基础,加速了具身AI领域从劳动密集型手动调优到可扩展自主工程的范式转变。

英文摘要

The field of Embodied AI is witnessing a rapid evolution toward general-purpose robotic systems, fueled by high-fidelity simulation and large-scale data collection. However, this scaling capability remains severely bottlenecked by a reliance on labor-intensive manual oversight from intricate reward shaping to hyperparameter tuning across heterogeneous backends. Inspired by LLMs' success in software automation and science discovery, we introduce \textsc{EmboCoach-Bench}, a benchmark evaluating the capacity of LLM agents to autonomously engineer embodied policies. Spanning 32 expert-curated RL and IL tasks, our framework posits executable code as the universal interface. We move beyond static generation to assess a dynamic closed-loop workflow, where agents leverage environment feedback to iteratively draft, debug, and optimize solutions, spanning improvements from physics-informed reward design to policy architectures such as diffusion policies. Extensive evaluations yield three critical insights: (1) autonomous agents can qualitatively surpass human-engineered baselines by 26.5\% in average success rate; (2) agentic workflow with environment feedback effectively strengthens policy development and substantially narrows the performance gap between open-source and proprietary models; and (3) agents exhibit self-correction capabilities for pathological engineering cases, successfully resurrecting task performance from near-total failures through iterative simulation-in-the-loop debugging. Ultimately, this work establishes a foundation for self-evolving embodied intelligence, accelerating the paradigm shift from labor-intensive manual tuning to scalable, autonomous engineering in embodied AI field.

2601.13346 2026-06-12 cs.CL 版本更新

AfroScope: A Framework for Studying the Linguistic Landscape of Africa

AfroScope:研究非洲语言景观的框架

Sang Yun Kwon, AbdelRahim Elmadany, Muhammad Abdul-Mageed

发表机构 * The University of British Columbia(不列颠哥伦比亚大学)

AI总结 提出AfroScope框架,包含覆盖640种语言的数据集和模型套件,通过层次分类和专用嵌入模型解决近亲语言混淆问题,提升宏F1分数1.57点,并分析跨语言迁移和领域效应。

详情
AI中文摘要

语言识别(LID)是确定给定文本语言的任务,是影响下游NLP应用可靠性的基本预处理步骤。尽管近期工作扩展了非洲LID,现有系统在语言覆盖范围以及近亲语言和变体的细粒度区分方面仍然有限。我们引入了AfroScope,一个统一的非洲LID框架,包括AfroScope-Data(覆盖640种语言的数据集)和AfroScope-Models(一套具有广泛非洲语言覆盖的强LID模型)。为了解决近亲语言之间持续存在的混淆问题,我们提出了一种层次分类方法,利用AfroScope-Mirror(一种专门用于目标消歧的嵌入模型),在易混淆子集上相比最佳基础模型提升了1.57个宏F1分数。我们进一步分析了跨语言迁移和领域效应,展示了语言家族结构、脚本兼容性和领域覆盖如何影响LID性能。我们将非洲LID定位为大规模测量数字文本中非洲语言景观的使能技术,并在线发布了AfroScope-Data和AfroScope-Models。

英文摘要

Language Identification (LID), the task of determining the language of a given text, is a fundamental preprocessing step that shapes the reliability of downstream NLP applications. While recent work has expanded African LID, existing systems remain limited in both language coverage and fine-grained discrimination among closely related languages and varieties. We introduce AfroScope, a unified framework for African LID that includes AfroScope-Data, a dataset covering 640 languages, and AfroScope-Models, a suite of strong LID models with broad African language coverage. To address persistent confusions among closely related languages, we propose a hierarchical classification approach that leverages AfroScope-Mirror, a specialized embedding model for targeted disambiguation, improving macro-F1 by 1.57 points on the confusable subset compared to our best base model. We further analyze cross-lingual transfer and domain effects, showing how language-family structure, script compatibility, and domain coverage shape LID performance. We position African LID as an enabling technology for large-scale measurement of Africa's linguistic landscape in digital text, and release AfroScope-Data and AfroScope-Models online.

2512.22287 2026-06-12 cs.LG cs.AI 版本更新

Cluster Aggregated GAN (CAG): A Cluster-Based Hybrid Model for Appliance Pattern Generation

聚类聚合生成对抗网络 (CAG):一种基于聚类的混合模型用于电器模式生成

Zikun Guo, Adeyinka. P. Adedigba, Rammohan Mallipeddi

发表机构 * Department of Artificial Intelligence, School of Electronics Engineering, Kyungpook National University(人工智能系,电子工程学院,全北国立大学)

AI总结 针对现有生成方法忽略间歇性与连续电器行为差异导致训练不稳定和保真度有限的问题,提出CAG框架,通过聚类模块为间歇电器分配专用生成器,连续电器使用LSTM生成器,在UVIC数据集上优于基线方法。

详情
Comments
18pages, 5Figues
AI中文摘要

合成电器数据对于开发非侵入式负荷监测算法和实现隐私保护的能源研究至关重要,然而标记数据集的稀缺性仍然是一个重大障碍。最近基于GAN的方法已经证明了合成负荷模式的可行性,但大多数现有方法在单个模型内统一处理所有设备,忽略了间歇性和连续性电器之间的行为差异,导致训练不稳定和输出保真度有限。为了解决这些局限性,我们提出了聚类聚合生成对抗网络框架,这是一种混合生成方法,根据每个电器的行为特征将其路由到专门的分支。对于间歇性电器,聚类模块将相似的激活模式分组,并为每个聚类分配专用生成器,确保常见和罕见操作模式都获得足够的建模能力。连续性电器遵循单独的分支,采用基于LSTM的生成器来捕捉逐渐的时间演变,同时通过序列压缩保持训练稳定性。在UVIC智能插头数据集上的大量实验表明,所提出的框架在衡量真实性、多样性和训练稳定性的指标上始终优于基线方法,并且将聚类作为主动生成组件显著提高了可解释性和可扩展性。这些发现确立了所提出的框架作为非侵入式负荷监测研究中合成负荷生成的有效方法。

英文摘要

Synthetic appliance data are essential for developing non-intrusive load monitoring algorithms and enabling privacy preserving energy research, yet the scarcity of labeled datasets remains a significant barrier. Recent GAN-based methods have demonstrated the feasibility of synthesizing load patterns, but most existing approaches treat all devices uniformly within a single model, neglecting the behavioral differences between intermittent and continuous appliances and resulting in unstable training and limited output fidelity. To address these limitations, we propose the Cluster Aggregated GAN framework, a hybrid generative approach that routes each appliance to a specialized branch based on its behavioral characteristics. For intermittent appliances, a clustering module groups similar activation patterns and allocates dedicated generators for each cluster, ensuring that both common and rare operational modes receive adequate modeling capacity. Continuous appliances follow a separate branch that employs an LSTM-based generator to capture gradual temporal evolution while maintaining training stability through sequence compression. Extensive experiments on the UVIC smart plug dataset demonstrate that the proposed framework consistently outperforms baseline methods across metrics measuring realism, diversity, and training stability, and that integrating clustering as an active generative component substantially improves both interpretability and scalability. These findings establish the proposed framework as an effective approach for synthetic load generation in non-intrusive load monitoring research.

2601.17654 2026-06-12 cs.LG cs.DC 版本更新

Kareus: Joint Reduction of Dynamic and Static Energy in Large Model Training

Kareus:大型模型训练中动态与静态能量的联合降低

Ruofan Wu, Jae-Won Chung, Mosharaf Chowdhury

发表机构 * University of Michigan(密歇根大学)

AI总结 针对AI训练能耗高昂问题,提出Kareus系统,通过联合优化细粒度内核调度与频率缩放,协同降低动态和静态能耗,在相同训练时间下节能28.3%,或相同能耗下提速27.5%。

详情
Comments
OSDI '26 | Open-source at https://github.com/ml-energy/kareus
AI中文摘要

AI的计算需求正以前所未有的速度增长,但能源供应并未跟上步伐。因此,能源已成为一种昂贵且受争抢的资源,需要明确的管理和优化。尽管近期工作在大型模型训练优化方面取得了显著进展,但它们侧重于优化动态或静态能耗中的一种。我们发现,细粒度的内核调度和频率缩放共同且相互依赖地影响动态和静态能耗。基于这一发现,我们设计了Kareus,一个通过优化两方面来推动时间-能耗权衡前沿的训练系统。Kareus将棘手的联合优化问题分解为基于分区的局部子问题,然后使用多遍多目标优化算法来找到推动时间-能耗权衡前沿的执行调度。与现有技术相比,Kareus在相同训练时间下最多可减少28.3%的训练能耗,或在相同能耗下最多减少27.5%的训练时间。

英文摘要

The computing demand of AI is growing at an unprecedented rate, but energy supply is not keeping pace. As a result, energy has become an expensive and contended resource that requires explicit management and optimization. Although recent works have made significant progress in large model training optimization, they focus on optimizing either dynamic or static energy consumption. We find that fine-grained kernel scheduling and frequency scaling jointly and interdependently impact both dynamic and static energy consumption. Based on this finding, we design Kareus, a training system that pushes the time-energy tradeoff frontier by optimizing both aspects. Kareus decomposes the intractable joint optimization problem into local, partition-based subproblems. It then uses a multi-pass multi-objective optimization algorithm to find execution schedules that push the time-energy tradeoff frontier. Compared to the state of the art, Kareus reduces training energy by up to 28.3% at the same training time, or reduces training time by up to 27.5% at the same energy consumption.

2509.03340 2026-06-12 cs.LG cs.AI cs.CE physics.comp-ph 版本更新

Equivariant Flow Matching for Symmetry-Breaking Bifurcation Problems

等变流匹配用于对称破缺分岔问题

Fleur Hendriks, Ondřej Rokoš, Martin Doškář, Marc G. D. Geers, Vlado Menkovski

发表机构 * Department of Mechanical Engineering, Eindhoven University of Technology(埃因霍温理工大学机械工程系) DIFFER – Dutch Institute for Fundamental Energy Research(荷兰基础能源研究所) Faculty of Civil Engineering, Department of Mechanics, Czech Technical University in Prague(布拉格捷克技术大学土木工程学院力学系) Department of Mathematics and Computer Science, Eindhoven University of Technology(埃因霍温理工大学数学与计算机科学系)

AI总结 针对非线性动力系统中对称破缺导致的多稳态共存问题,提出等变流匹配方法,结合等变架构与最优传输耦合机制,准确捕捉多模态分布和对称破缺分岔,优于非概率和变分方法。

详情
Comments
9 pages, 7 figures including appendices. Accepted to Machine Learning and the Physical Sciences Workshop, NeurIPS 2025 (https://ml4physicalsciences.github.io/2025/). Repository with corresponding code: https://github.com/FHendriks11/bifurcationML/. Video explanation: https://www.youtube.com/watch?v=wsL3h17KtjY
AI中文摘要

非线性动力系统中的分岔现象通常导致多个共存的稳定解,特别是在对称破缺的情况下。确定性机器学习模型无法捕捉这种多重性,会平均化解并无法表示低对称性结果。在这项工作中,我们正式将生成式AI(特别是流匹配)作为建模分岔结果全概率分布的原则性方法。我们的方法建立在现有技术基础上,将流匹配与等变架构和基于最优传输的耦合机制相结合。我们将等变流匹配推广到一种对称耦合策略,该策略在群作用下对齐预测和目标输出,从而在等变设置中实现准确学习。我们在从简单概念系统到物理问题(如屈曲梁和Allen-Cahn方程)的一系列系统上验证了我们的方法。结果表明,该方法准确捕捉了多模态分布和对称破缺分岔。此外,我们的结果表明,流匹配显著优于非概率和变分方法。这为高维系统中的多稳态建模提供了一种原则性且可扩展的解决方案。

英文摘要

Bifurcation phenomena in nonlinear dynamical systems often lead to multiple coexisting stable solutions, particularly in the presence of symmetry breaking. Deterministic machine learning models are unable to capture this multiplicity, averaging over solutions and failing to represent lower-symmetry outcomes. In this work, we formalize the use of generative AI, specifically flow matching, as a principled way to model the full probability distribution over bifurcation outcomes. Our approach builds on existing techniques by combining flow matching with equivariant architectures and an optimal-transport-based coupling mechanism. We generalize equivariant flow matching to a symmetric coupling strategy that aligns predicted and target outputs under group actions, allowing accurate learning in equivariant settings. We validate our approach on a range of systems, from simple conceptual systems to physical problems such as buckling beams and the Allen--Cahn equation. The results demonstrate that the approach accurately captures multimodal distributions and symmetry-breaking bifurcations. Moreover, our results demonstrate that flow matching significantly outperforms non-probabilistic and variational methods. This offers a principled and scalable solution for modeling multistability in high-dimensional systems.

2507.02921 2026-06-12 cs.LG cs.AI 版本更新

PlaceRep: Geospatial Place Representation Learning from Large-Scale Point-of-Interest Data

PlaceRep: 基于大规模兴趣点数据的地理空间场所表示学习

Mohammad Hashemi, Hossein Amiri, Andreas Zufle

发表机构 * Emory University(埃默里大学)

AI总结 提出PlaceRep方法,通过聚类空间和语义相关的兴趣点构建场所级表示,无需预训练即可高效生成多尺度城市区域嵌入,在人口密度估计和房价预测任务中优于现有方法并实现百倍加速。

详情
AI中文摘要

学习城市环境的有效表示需要捕捉超越固定行政边界的空间结构。现有的地理空间表示学习方法通常将兴趣点(POI)聚合到预定义的行政区域(如普查单元或邮政编码区域),为每个区域分配单个嵌入。然而,POI 通常形成跨越、包含或超出这些边界的语义上有意义的组,定义了更能反映人类活动和城市功能的场所。为解决这一局限性,我们提出 PlaceRep,一种通过聚类空间和语义相关的 POI 来构建场所级表示的地理空间表示学习方法。PlaceRep 从美国 Foursquare 数据中总结大规模 POI 图,生成通用城市区域嵌入,同时自动识别跨多个空间尺度的场所。通过消除模型预训练,PlaceRep 为多粒度地理空间分析提供了可扩展且高效的解决方案。使用人口密度估计和房价预测作为下游任务的实验表明,PlaceRep 优于大多数最先进的基于图的地理空间表示学习方法,并在大规模 POI 图上生成区域级表示时实现了高达 100 倍的加速。PlaceRep 的实现可在该 https URL 获取。

英文摘要

Learning effective representations of urban environments requires capturing spatial structure beyond fixed administrative boundaries. Existing geospatial representation learning approaches typically aggregate Points of Interest (POIs) into pre-defined administrative regions such as census units or ZIP code areas, assigning a single embedding to each region. However, POIs often form semantically meaningful groups that extend across, within, or beyond these boundaries, defining places that better reflect human activity and urban function. To address this limitation, we propose PlaceRep, a geospatial representation learning method that constructs place-level representations by clustering spatially and semantically related POIs. PlaceRep summarizes large-scale POI graphs from U.S. Foursquare data to produce general-purpose urban region embeddings while automatically identifying places across multiple spatial scales. By eliminating model pre-training, PlaceRep provides a scalable and efficient solution for multi-granular geospatial analysis. Experiments using the tasks of population density estimation and housing price prediction as downstream tasks show that PlaceRep outperforms most state-of-the-art graph-based geospatial representation learning methods and achieves up to a x100 speedup in generating region-level representations on large-scale POI graphs. The implementation of PlaceRep is available at https://github.com/mohammadhashemii/PlaceRep.

2601.15503 2026-06-12 cs.LG 版本更新

Data-driven Lake Water Quality Forecasting for Time Series with Missing Data using Machine Learning

基于机器学习的数据驱动湖泊水质时间序列缺失数据预测

Rishit Chatterjee, Tahiya Chowdhury

发表机构 * Department of Computer Science, Colby College(科克学院计算机科学系)

AI总结 针对志愿者监测导致的湖泊数据缺失问题,采用多重插补和岭回归,在30个湖泊数据集上实现透明度预测,并量化了最小样本量和特征集,提出联合可行性函数以优化监测策略。

详情
Journal ref
Published in: 2026 IEEE Conference on Technologies for Sustainability (SusTech)
Comments
8 pages, 4 figures, 3 tables
AI中文摘要

志愿者主导的湖泊监测产生不规则、季节性的时间序列,由于冰盖、天气相关的通行限制以及偶尔的人为错误,存在大量缺失数据,这给有害藻华预测和早期预警带来了困难。我们研究了基于来自缅因州湖泊三十年间原位记录的数据丰富子集(30个湖泊)的塞氏盘深度(SDD)预测。通过链式方程多重插补(MICE)处理缺失数据,并使用归一化平均绝对误差(nMAE)指标进行跨湖泊性能比较。在六种候选模型中,岭回归提供了最佳的平均测试性能。利用岭回归,我们量化了最小样本量,表明在向后近期历史协议下,模型平均每个湖泊约176个训练样本即可达到全历史准确率的5%以内。我们还确定了最小特征集,其中紧凑的四特征子集在相同5%容差内匹配了十三特征基线。综合这些结果,我们引入了一个联合可行性函数,该函数识别出达到完整历史、全特征基线5%以内目标所需的最小训练历史和最少预测变量。在我们的研究中,达到5%准确率目标需要每个湖泊约64个近期样本和仅一个预测变量,凸显了针对性监测的实用性。因此,我们的联合可行性策略在固定准确率目标下统一了近期历史长度和特征选择,为湖泊研究人员制定采样工作和测量优先级提供了简单高效的规则。

英文摘要

Volunteer-led lake monitoring yields irregular, seasonal time series with many gaps arising from ice cover, weather-related access constraints, and occasional human errors, complicating forecasting and early warning of harmful algal blooms. We study Secchi Disk Depth (SDD) forecasting on a 30-lake, data-rich subset drawn from three decades of in-situ records collected across Maine lakes. Missingness is handled via Multiple Imputation by Chained Equations (MICE), and we evaluate performance with a normalized Mean Absolute Error (nMAE) metric for cross-lake comparability. Among six candidates, ridge regression provides the best mean test performance. Using ridge regression, we then quantify the minimal sample size, showing that under a backward, recent-history protocol, the model reaches within 5% of full-history accuracy with approximately 176 training samples per lake on average. We also identify a minimal feature set, where a compact four-feature subset matches the thirteen-feature baseline within the same 5% tolerance. Bringing these results together, we introduce a joint feasibility function that identifies the minimal training history and fewest predictors sufficient to achieve the target of staying within 5% of the complete-history, full-feature baseline. In our study, meeting the 5% accuracy target required about 64 recent samples and just one predictor per lake, highlighting the practicality of targeted monitoring. Hence, our joint feasibility strategy unifies recent-history length and feature choice under a fixed accuracy target, yielding a simple, efficient rule for setting sampling effort and measurement priorities for lake researchers.

2601.13591 2026-06-12 cs.AI cs.CL 版本更新

DSAEval: Evaluating Data Science Agents on a Wide Range of Real-World Data Science Problems

DSAEval:在广泛真实世界数据科学问题上评估数据科学智能体

Maojun Sun, Yifei Xie, Yue Wu, Ruijian Han, Binyan Jiang, Defeng Sun, Yancheng Yuan, Jian Huang

发表机构 * Department of Data Science and Artificial Intelligence, Hong Kong Polytechnic University(数据科学与人工智能系,香港理工大学) Department of Applied Mathematics, Hong Kong Polytechnic University(应用数学系,香港理工大学)

AI总结 提出包含641个真实数据科学问题的基准DSAEval,涵盖多模态环境感知、多查询交互和多维评估,系统评估13个先进LLM智能体,发现Claude-Sonnet-4.5综合最优,多模态感知提升视觉任务性能2.04%-11.30%。

详情
AI中文摘要

近期基于LLM的数据智能体旨在自动化从数据分析到深度学习的数据科学任务。然而,真实世界数据科学问题的开放性——通常跨越多个分类且缺乏标准答案——给评估带来了重大挑战。为此,我们引入了DSAEval,一个包含641个基于285个多样化数据集的真实世界数据科学问题的基准,涵盖结构化和非结构化数据(例如图像和文本)。DSAEval包含三个独特特征:(1)多模态环境感知,使智能体能够解释来自多种模态(包括文本和视觉)的观察;(2)多查询交互,反映真实世界数据科学项目的迭代和累积性质;(3)多维评估,提供跨推理、代码和结果的全面评估。我们使用DSAEval系统评估了13个近期先进的智能体LLM。结果表明,Claude-Sonnet-4.5实现了最强的整体性能,MiMo-V2-Pro在持续时间上领先,GPT-5.2在步骤效率上领先,而MiMo-V2-Flash最具成本效益。我们进一步证明,多模态感知持续提升视觉相关任务的性能,增益范围为2.04%至11.30%。总体而言,尽管当前数据科学智能体在结构化数据和常规数据分析工作流上表现良好,但在非结构化领域仍存在重大挑战。最后,我们提供了关键见解并概述了未来研究方向。

英文摘要

Recent LLM-based data agents aim to automate data science tasks ranging from data analysis to deep learning. However, the open-ended nature of real-world data science problems, which often span multiple taxonomies and lack standard answers, poses a significant challenge for evaluation. To address this, we introduce DSAEval, a benchmark comprising 641 real-world data science problems grounded in 285 diverse datasets, covering both structured and unstructured data (e.g., image and text). DSAEval incorporates three distinctive features: (1) Multimodal Environment Perception, which enables agents to interpret observations from multiple modalities, including text and vision; (2) Multi-Query Interactions, which mirror the iterative and cumulative nature of real-world data science projects; and (3) Multi-Dimensional Evaluation, which provides a holistic assessment across reasoning, code, and results. We systematically evaluate 13 recent advanced agentic LLMs using DSAEval. Our results show that Claude-Sonnet-4.5 achieves the strongest overall performance, MiMo-V2-Pro and GPT-5.2 lead in duration and step efficiency, respectively, and MiMo-V2-Flash is the most cost-effective. We further demonstrate that multimodal perception consistently improves performance on vision-related tasks, with gains ranging from 2.04\% to 11.30\%. Overall, while current data science agents perform well on structured data and routine data analysis workflows, substantial challenges remain in unstructured domains. Finally, we offer critical insights and outline future research directions.

2508.12681 2026-06-12 cs.RO cs.LG cs.SY eess.SY 版本更新

Adaptive Model-Predictive Control of a Soft Continuum Robot Using a Physics-Informed Neural Network Based on Cosserat Rod Theory

基于Cosserat杆理论物理信息神经网络的软体连续机器人自适应模型预测控制

Johann Licher, Max Bartholdt, Henrik Krauss, Tim-Lukas Habich, Thomas Seel, Moritz Schappler

发表机构 * Institute of Mechatronic Systems, Leibniz University Hannover(机械系统研究所,汉诺威莱布尼茨大学) Department of Advanced Interdisciplinary Studies, The University of Tokyo(先进跨学科研究部,东京大学) Institute of Assembly Technology and Robotics, Leibniz University of Hannover(组装技术与机器人研究所,汉诺威莱布尼茨大学)

AI总结 提出一种基于域解耦物理信息神经网络(DD-PINN)的实时非线性模型预测控制框架,实现软体连续机器人的高精度动态控制,位置误差低于3 mm。

详情
Comments
Submitted to IEEE Transactions on Robotics, 20 pages, 14 figures
AI中文摘要

软体连续机器人(SCR)的动态控制对其应用扩展具有巨大潜力,但由于精确动态模型的高计算需求,仍然是一个具有挑战性的问题。虽然已经提出了如Koopman算子方法等数据驱动方法,但它们通常缺乏自适应性,且无法重建完整的机器人形状,限制了其适用性。本文介绍了一种基于具有自适应弯曲刚度的域解耦物理信息神经网络(DD-PINN)的实时非线性模型预测控制(MPC)框架。DD-PINN作为动态Cosserat杆模型的替代模型,加速比高达44,000倍。它还被用于无迹卡尔曼滤波器中,从末端执行器位置测量中估计模型状态和弯曲柔度。我们在GPU上实现了一个以70 Hz运行的非线性进化MPC。在仿真中,它展示了动态轨迹的精确跟踪和设定点控制,末端执行器位置误差低于3 mm(执行器长度的2.3%)。在实际实验中,控制器实现了类似的精度和高达3.55 m/s²的加速度。

英文摘要

Dynamic control of soft continuum robots (SCRs) holds great potential for expanding their applications, but remains a challenging problem due to the high computational demands of accurate dynamic models. While data-driven approaches like Koopman-operator-based methods have been proposed, they typically lack adaptability and cannot reconstruct the full robot shape, limiting their applicability. This work introduces a real-time-capable nonlinear model-predictive control (MPC) framework for SCRs based on a domain-decoupled physics-informed neural network (DD-PINN) with adaptable bending stiffness. The DD-PINN serves as a surrogate for the dynamic Cosserat rod model with a speed-up factor of up to 44,000. It is also used within an unscented Kalman filter for estimating the model states and bending compliance from end-effector position measurements. We implement a nonlinear evolutionary MPC running at 70 Hz on the GPU. In simulation, it demonstrates accurate tracking of dynamic trajectories and setpoint control with end-effector position errors below 3 mm (2.3\% of the actuator's length). In real-world experiments, the controller achieves similar accuracy and accelerations up to 3.55 m/s2.

2509.18085 2026-06-12 cs.LG cs.AI cs.CL 版本更新

Structuring The Future: Diffusion LLM Speculative Decoding via Calibrated Draft Graphs

构建未来:通过校准草稿图实现扩散LLM推测解码

Sudhanshu Agrawal, Risheek Garrepalli, Raghavv Goel, Christopher Lott, Fatih Porikli, Mingu Lee

发表机构 * University of Waterloo(多伦多大学)

AI总结 提出Spiffy算法,利用校准的草稿图结构实现扩散LLM的推测解码,在保持输出分布的同时加速推理,最高减少8.6倍模型推理次数并加速6.3倍令牌生成速率。

详情
Comments
Original version uploaded on Sep 22, 2025. (v2): Extended Table 2 with additional analysis and referenced it in Sec 5.2. (v3): Added note to Sec 4.2 and Appendix A.2 specifying conditions for losslessness. (v4): Updated with the version accepted to ICML 2026 workshops
AI中文摘要

扩散LLM(dLLM)最近作为自回归LLM(AR-LLM)的强大替代方案出现,具有以显著更高的令牌生成速率运行的潜力。为了释放这一潜力,我们提出了Spiffy,一种推测解码算法,用于加速dLLM推理,同时可证明地保持模型的输出分布。这项工作解决了将AR-LLM的推测解码思想应用于dLLM所涉及的独特挑战。Spiffy执行自动推测以消除独立草稿模型的开销,以新颖的有向草稿图形式构建草稿状态,以利用dLLM生成的双向、块状特性。这些草稿图离线校准以最大化接受率,并在推理过程中动态剪枝以提高计算效率。我们给出了Spiffy的详细公式,并展示了其与KV缓存和基于阈值的动态掩码相结合,加速LLaDA、Dream和SDAR模型的能力,导致模型推理次数减少高达8.6倍,令牌速率加速高达6.3倍。

英文摘要

Diffusion LLMs (dLLMs) have recently emerged as a powerful alternative to autoregressive LLMs (AR-LLMs) with the potential to operate at significantly higher token-generation rates. To unlock this potential, we present Spiffy, a speculative decoding algorithm to accelerate dLLM inference while provably preserving the model's output distribution. This work addresses the unique challenges involved in applying ideas from speculative decoding of AR-LLMs to dLLMs. Spiffy performs auto-speculation to eliminate the overheads of an independent draft model, structuring draft states in the form of a novel directed draft graph to take advantage of the bidirectional, blockwise nature of dLLM generation. These draft graphs are calibrated offline to maximize acceptance rates and are dynamically pruned during inference for improved computational efficiency. We present a detailed formulation of Spiffy and demonstrate its ability to accelerate LLaDA, Dream, and SDAR models in combination with KV caching and threshold-based dynamic unmasking leading to up to $8.6\times$ reduction in model inferences and $6.3\times$ acceleration in token rate.

2601.06227 2026-06-12 cs.LG cs.AI 版本更新

When Smaller Wins: Dual-Stage Distillation and Pareto-Guided Compression of Liquid Neural Networks for Edge Battery Prognostics

当更小胜出:面向边缘电池健康预测的液态神经网络双阶段蒸馏与帕累托引导压缩

Dhivya Dharshini Kannan, Wei Li, Wei Zhang, Jianbiao Wang, Zhi Wei Seh, Man-Fai Ng

发表机构 * Singapore Institute of Technology(新加坡科技学院) Institute of Materials Research and Engineering(材料研究与工程研究所) Agency for Science, Technology and Research(科技研究局) Institute of High Performance Computing(高性能计算研究所)

AI总结 提出DLNet框架,通过欧拉离散化、双阶段知识蒸馏和帕累托引导压缩,将高容量液态神经网络压缩为边缘可部署模型,在电池健康预测中实现小模型超越大模型。

详情
Comments
Accepted at International Conference on Pattern Recognition, ICPR 2026. Code available at: https://github.com/Dhivya-DD17/DLNet
AI中文摘要

电池管理系统日益需要在严格的设备端约束下进行准确的电池健康预测。本文提出DLNet,一个实用的双阶段液态神经网络蒸馏框架,将高容量模型转化为紧凑且可边缘部署的电池健康预测模型。DLNet首先应用欧拉离散化重新表述液态动力学以实现嵌入式兼容性。然后进行双阶段知识蒸馏,以传递教师模型的时间行为,并在进一步压缩后恢复该行为。在联合误差-成本目标下的帕累托引导选择保留了平衡准确性和效率的学生模型。我们在广泛使用的数据集上评估DLNet,并在Arduino Nano 33 BLE Sense上使用int8部署验证实际设备可行性。最终部署的学生模型在预测未来100个周期的电池健康时实现了0.0066的低误差,比教师模型低15.4%。模型大小从616 kB减少到94 kB,减少了84.7%,在设备上每次推理耗时21毫秒。这些结果支持了一个实用的“更小胜出”观察:通过适当的监督和选择,小模型可以在边缘预测中匹配或超越大模型。除了电池,DLNet框架可以扩展到其他具有严格硬件约束的工业分析任务。

英文摘要

Battery management systems increasingly require accurate battery health prognostics under strict on-device constraints. This paper presents DLNet, a practical framework with dual-stage distillation of liquid neural networks that turns a high-capacity model into compact and edge-deployable models for battery health prediction. DLNet first applies Euler discretization to reformulate liquid dynamics for embedded compatibility. It then performs dual-stage knowledge distillation to transfer the teacher model's temporal behavior and recover it after further compression. Pareto-guided selection under joint error-cost objectives retains student models that balance accuracy and efficiency. We evaluate DLNet on a widely used dataset and validate real-device feasibility on an Arduino Nano 33 BLE Sense using int8 deployment. The final deployed student achieves a low error of 0.0066 when predicting battery health over the next 100 cycles, which is 15.4% lower than the teacher model. It reduces the model size from 616 kB to 94 kB with 84.7% reduction and takes 21 ms per inference on the device. These results support a practical smaller wins observation that a small model can match or exceed a large teacher for edge-based prognostics with proper supervision and selection. Beyond batteries, the DLNet framework can extend to other industrial analytics tasks with strict hardware constraints.

2601.03184 2026-06-12 cs.LG cs.AI 版本更新

Decentralized Autoregressive Generation

分散自回归生成

Stepan Maschan, Haoxuan Qu, Jun Liu

发表机构 * Lancaster University(兰卡斯特大学)

AI总结 本文通过离散流匹配框架证明分散训练与集中训练在理论上等价,实验验证其在多模态基准上保持竞争力。

详情
AI中文摘要

近年来,自回归生成的分散化作为解决扩展瓶颈的方案引起了广泛关注。然而,尽管有令人鼓舞的实验结果,这一范式目前缺乏严格的理论证明。在这项工作中,我们正式建立了分散训练与集中训练之间的理论等价性。为此,我们调整了离散流匹配框架用于自回归生成,利用其固有性质证明全局模型自然分解为独立专家。最后,我们在多种多模态基准上进行了大量实验,实验验证了分散训练在标准集中架构上保持竞争性。

英文摘要

The decentralization of autoregressive generation has attracted considerable attention in recent years as a solution to scaling bottlenecks. However, despite promising empirical results, this paradigm currently lacks rigorous theoretical justification. In this work, we formally establish the theoretical equivalence between decentralized and centralized training. To achieve this, we adapt the Discrete Flow Matching framework for autoregressive generation, leveraging its inherent properties to demonstrate that global models naturally decompose into independent experts. Finally, we conduct extensive experiments across diverse multimodal benchmarks, empirically validating that decentralized training maintains competitive parity with standard centralized architectures.

2601.06279 2026-06-12 cs.CV 版本更新

EyeTheia: A Lightweight and Accessible Eye-Tracking Toolbox

EyeTheia:一个轻量级且易用的眼动追踪工具箱

Stevenson Pather, Niels Martignène, Arnaud Bugnet, Fouad Boutaleb, Fabien D'Hondt, Deise Santana Maia

发表机构 * Univ. Lille, Inserm, CHU Lille, U1172 - LilNCog - Lille Neuroscience & Cognition(里尔大学、法国国家医学研究院、里尔大学医院、U1172 - 里尔神经科学与认知中心) Univ. Lille, CNRS, Centrale Lille, UMR 9189 CRIStAL(里尔大学、法国国家科学研究中心、里尔中央理工大学、UMR 9189 CRIStAL) Centre national de ressources et de résilience (CN2R)(资源与韧性国家研究中心)

AI总结 提出基于网络摄像头的轻量级眼动追踪管道EyeTheia,结合MediaPipe特征提取和CNN模型,通过用户微调降低预测误差,在点探测任务中与商业方案表现一致。

详情
Comments
Code for the EyeTheia: https://github.com/patherstevenson/EyeTheia. Experimental platform for the cognitive neuroscience task (BAWEB IAPS): https://git.interactions-team.fr/INTERACTIONS/calypso/src/branch/main/src/medita/
AI中文摘要

我们介绍了EyeTheia,一个用于基于网络摄像头的视线估计的轻量级开源深度学习管道,专为基于浏览器的实验平台和现实世界的认知与临床研究设计。EyeTheia仅使用标准笔记本电脑摄像头即可实现实时视线追踪,结合基于MediaPipe的 landmarks 提取和受iTracker启发的卷积神经网络,并支持可选的用户特定微调。我们研究了两种互补策略:在移动数据上预训练模型,以及在桌面数据集上从头训练相同架构。在MPIIFaceGaze上的验证结果显示,在标定前两种方法性能相当,而轻量级的用户特定微调持续降低了视线预测误差。我们还在一个真实的点探测任务中评估了EyeTheia,并与商业网络摄像头追踪器SeeSo SDK进行了比较。结果表明,在刺激呈现期间左右视线分配上具有高度一致性,尽管时间变异性更高。总体而言,EyeTheia为低成本视线追踪提供了一个透明且可扩展的解决方案,适用于可扩展和可重复的实验与临床研究。代码、训练模型和实验材料均已公开。

英文摘要

We introduce EyeTheia, a lightweight and open deep learning pipeline for webcam-based gaze estimation, designed for browser-based experimental platforms and real-world cognitive and clinical research. EyeTheia enables real-time gaze tracking using only a standard laptop webcam, combining MediaPipe-based landmark extraction with a convolutional neural network inspired by iTracker and optional user-specific fine-tuning. We investigate two complementary strategies: adapting a model pretrained on mobile data and training the same architecture from scratch on a desktop-oriented dataset. Validation results on MPIIFaceGaze show comparable performance between both approaches prior to calibration, while lightweight user-specific fine-tuning consistently reduces gaze prediction error. We further evaluate EyeTheia in a realistic Dot-Probe task and compare it to the commercial webcam-based tracker SeeSo SDK. Results indicate strong agreement in left-right gaze allocation during stimulus presentation, despite higher temporal variability. Overall, EyeTheia provides a transparent and extensible solution for low-cost gaze tracking, suitable for scalable and reproducible experimental and clinical studies. The code, trained models, and experimental materials are publicly available.

2601.04885 2026-06-12 cs.CL cs.AI cs.LG 版本更新

CuMA: Aligning LLMs with Sparse Cultural Values via Demographic-Aware Mixture of Adapters

CuMA: 通过人口统计感知的适配器混合使大语言模型与稀疏文化价值观对齐

Ao Sun, Xiaoyu Wang, Zhe Tan, Yu Li, Jiachen Zhu, Shu Su, Yuheng Jia

发表机构 * Southeast University(东南大学) ByteDance Inc.(字节跳动公司) Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China(新一代人工智能技术及其交叉应用重点实验室(东南大学),中华人民共和国教育部,中国)

AI总结 提出CuMA框架,通过人口统计感知路由将冲突梯度分离到专家子空间,解决密集模型在多文化对齐中的均值崩溃问题,在WorldValuesBench等基准上取得最优性能。

详情
Comments
ACL 2026 Main
AI中文摘要

随着大语言模型服务于全球用户,对齐必须从强制执行普遍共识转向尊重文化多元主义。我们证明,密集模型在被迫适应冲突的价值分布时会出现\textbf{均值崩溃},收敛到无法代表不同群体的通用平均值。我们将其归因于\textbf{文化稀疏性},其中梯度干扰阻止密集参数跨越不同的文化模式。为解决此问题,我们提出\textbf{\textsc{CuMA}}(\textbf{Cu}ltural \textbf{M}ixture of \textbf{A}dapters),一个将对齐视为\textbf{条件容量分离}问题的框架。通过引入人口统计感知路由,\textsc{CuMA}内化了一个\textit{潜在文化拓扑},以将冲突梯度明确解耦到专门的专家子空间中。在WorldValuesBench、Community Alignment和PRISM上的广泛评估表明,\textsc{CuMA}达到了最先进的性能,显著优于密集基线和仅语义MoE。关键的是,我们的分析证实\textsc{CuMA}有效缓解了均值崩溃,保留了文化多样性。我们的代码可在该https URL获取。

英文摘要

As Large Language Models (LLMs) serve a global audience, alignment must transition from enforcing universal consensus to respecting cultural pluralism. We demonstrate that dense models, when forced to fit conflicting value distributions, suffer from \textbf{Mean Collapse}, converging to a generic average that fails to represent diverse groups. We attribute this to \textbf{Cultural Sparsity}, where gradient interference prevents dense parameters from spanning distinct cultural modes. To resolve this, we propose \textbf{\textsc{CuMA}} (\textbf{Cu}ltural \textbf{M}ixture of \textbf{A}dapters), a framework that frames alignment as a \textbf{conditional capacity separation} problem. By incorporating demographic-aware routing, \textsc{CuMA} internalizes a \textit{Latent Cultural Topology} to explicitly disentangle conflicting gradients into specialized expert subspaces. Extensive evaluations on WorldValuesBench, Community Alignment, and PRISM demonstrate that \textsc{CuMA} achieves state-of-the-art performance, significantly outperforming both dense baselines and semantic-only MoEs. Crucially, our analysis confirms that \textsc{CuMA} effectively mitigates mean collapse, preserving cultural diversity. Our code is available at https://github.com/Throll/CuMA.

2507.07947 2026-06-12 cs.LG cs.AI 版本更新

Reconstructing Template-Memorized Images from Natural Prompts

从自然提示中重建模板记忆的图像

Sol Yarkoni, Mahmood Sharif, Roi Livni

发表机构 * School of Electrical & Computer Engineering(电气与计算机工程学院) School of Computer Science & AI(计算机科学与人工智能学院) Tel Aviv University(特拉维夫大学)

AI总结 提出一种低资源攻击方法,利用模板化电商数据中的模式,从自然提示中重建训练集中的记忆图像,揭示隐私风险。

详情
AI中文摘要

生成模型(如扩散模型)的最新进展引发了与隐私、版权侵犯和数据管理相关的担忧。为了更好地理解和控制这些风险,先前的工作引入了从训练数据中重建图像或部分图像的技术和攻击。虽然这些结果表明训练数据可以被恢复,但现有方法通常依赖于高计算资源、对训练集的部分访问或精心设计的提示。在这项工作中,我们提出了一种新的攻击,该攻击需要低资源,假设对训练数据几乎没有或完全没有访问权限,并识别出看似良性的提示,这些提示可能导致潜在有风险的图像重建。我们进一步表明,即使对于没有专业知识的用户,这种重建也可能无意中发生。例如,我们观察到,对于现有模型,提示“蓝色男女通用T恤”会生成一个真实个体的面部。此外,通过将已识别的漏洞与真实世界的提示数据相结合,我们发现了能够重现记忆视觉元素的提示。我们的方法建立在先前工作的见解之上,并利用领域知识来揭示由于使用抓取的电商数据而产生的基本漏洞,其中模板化布局和图像与模式化的文本提示紧密相关。我们的攻击代码在此https URL公开。

英文摘要

Recent advances in generative models, such as diffusion models, have raised concerns related to privacy, copyright infringement, and data stewardship. To better understand and control these risks, prior work has introduced techniques and attacks that reconstruct images, or parts of images, from training data. While these results demonstrate that training data can be recovered, existing methods often rely on high computational resources, partial access to the training set, or carefully engineered prompts. In this work, we present a new attack that requires low resources, assumes little to no access to the training data, and identifies seemingly benign prompts that can lead to potentially risky image reconstruction. We further show that such reconstructions may occur unintentionally, even for users without specialized knowledge. For example, we observe that for one existing model, the prompt ``blue Unisex T-Shirt'' generates the face of a real individual. Moreover, by combining the identified vulnerabilities with real-world prompt data, we discover prompts that reproduce memorized visual elements. Our approach builds on insights from prior work and leverages domain knowledge to expose a fundamental vulnerability arising from the use of scraped e-commerce data, where templated layouts and images are closely tied to pattern-like textual prompts. The code for our attack is publicly available at https://github.com/TheSolY/lr-tmi.

2601.02177 2026-06-12 cs.CV cs.CR 版本更新

Why Commodity WiFi Sensors Fail at Multi-Person Gait Identification: A Systematic Analysis Using ESP32

为什么商用WiFi传感器在多人体步态识别中失败:基于ESP32的系统分析

Oliver Custance, Saad Khan, Simon Parkinson

发表机构 * University of Cambridge(剑桥大学)

AI总结 通过ESP32实验发现,多人体步态识别性能差主要源于商用WiFi的感知质量限制,而非算法选择。

详情
AI中文摘要

WiFi信道状态信息(CSI)在单人步态识别中展现出潜力,引发了对其在非接触式生物识别、持续认证和被动识别中应用的兴趣。然而,在低成本商用设备上进行多人识别的可行性仍不清楚。一个关键问题是,较差的多人性能主要是算法限制,还是反映了商用WiFi硬件更根本的感知上限。我们通过使用商用ESP32 WiFi传感器的系统实证研究来回答这个问题。我们评估了六种不同的信号分离方法——FastICA、SOBI、PCA-ICA、NMF、小波和张量分解——在七个场景中,覆盖1-10人,包括受控和现实室内环境。为了超越分类准确率进行研究,我们引入了三个诊断指标:受试者内变异性(ISV)、受试者间可区分性(ISD)和性能退化率(PDR)。所有方法的性能均中等(39%-56%准确率),几乎没有证据表明仅靠算法选择能解决问题。表现最佳的方法NMF达到56%准确率,而所有方法都表现出极高的特征空间重叠(97%-99%)、不稳定的受试者内表示以及显著的环境敏感性。这些发现表明,在商用ESP32 CSI约束下,密集多人步态识别更多受限于感知质量和空间多样性,而非所选分离算法。我们的结果对安全和隐私有直接影响:它们质疑了商用WiFi CSI作为稳健的多用户生物识别基元的实用性,同时也对低成本现成WiFi硬件可实现的被动识别能力施加了重要限制。

英文摘要

WiFi Channel State Information (CSI) has shown promise for single-person gait identification, raising interest in its use for contactless biometrics, continuous authentication, and passive identification. However, the feasibility of multi-person identification on low-cost commodity devices remains unclear. A critical question is whether weak multi-person performance is primarily an algorithmic limitation, or whether it reflects a more fundamental sensing ceiling on commodity WiFi hardware. We address this question through a systematic empirical study using commodity ESP32 WiFi sensors. We evaluated six different signal separation methods--FastICA, SOBI, PCA-ICA, NMF, Wavelet, and Tensor decomposition--across seven scenarios spanning 1-10 people in both controlled and realistic indoor environments. To investigate beyond classification accuracy, we introduce three diagnostic metrics: intra-subject variability (ISV), inter-subject distinguishability (ISD), and performance degradation rate (PDR). In all methods, performance remains moderate (39%-56% accuracy), with limited evidence that algorithmic choice alone solves the problem. The best-performing method, NMF, reaches 56% accuracy, while all methods exhibit extremely high feature-space overlap (97%-99%), unstable within-subject representations, and marked environmental sensitivity. These findings suggest that, under commodity ESP32 CSI constraints, dense multi-person gait identification is limited more by sensing quality and spatial diversity than by the chosen separation algorithm. Our results have direct implications for security and privacy: they call into question the practicality of commodity WiFi CSI as a robust multi-user biometric primitive for authentication, while also placing important bounds on the passive identification capabilities achievable with low-cost off-the-shelf WiFi hardware.

2304.13836 2026-06-12 cs.LG cs.AI cs.CV stat.ME 版本更新

On Pitfalls of $\textit{RemOve-And-Retrain}$: Data Processing Inequality Perspective

论 $\textit{RemOve-And-Retrain}$ 的陷阱:数据处理不等式视角

Junhwa Song, Keumgang Cha, Junghoon Seo

发表机构 * KAIST(韩国科学技术院)

AI总结 从信息论角度揭示ROAR基准的缺陷:数据无关的后处理可提升ROAR分数,导致对归因图信息量的误判,并发现模糊性偏差。

详情
Comments
Accepted at the 2026 ICML Workshop on Mechanistic Interpretability
AI中文摘要

RemOve-And-Retrain (ROAR) 基准被广泛用于评估特征归因方法,但其有效性尚未从信息论角度得到充分探索。我们证明,对归因图进行模型和数据无关的后处理(通过数据处理不等式,这些变换\emph{不能}增加关于决策函数的信息)通常可以改善ROAR分数。这意味着ROAR排名的提升本身并不能证明归因图携带更多关于模型的信息。我们将这种失败模式归因于对空间模糊掩膜的偏好。在CIFAR-10、SVHN和CUB-200上的实验显示,模糊度与ROAR性能之间存在一致的关联,这种模式也出现在ROAD变体中。我们为更谨慎的基于移除的基准测试提供了指导方针,这对验证神经网络内部机制的机械理解具有重要意义。

英文摘要

The RemOve-And-Retrain (ROAR) benchmark is widely used to evaluate feature attribution methods, yet its validity remains underexplored from an information-theoretic perspective. We show that model- and data-agnostic post-processing of attribution maps (transformations that, by the data processing inequality, \emph{cannot} add information about the decision function) can often improve ROAR scores. This means that an improved ROAR ranking is not, by itself, evidence that an attribution map carries more information about the model. We trace this failure mode to a bias toward spatially blurry masks. Experiments on CIFAR-10, SVHN, and CUB-200 show a consistent association between blurriness and ROAR performance, a pattern that also appears in the ROAD variant. We provide guidelines for more cautious removal-based benchmarking, with implications for validating mechanistic understanding of neural network internals.

2509.07150 2026-06-12 cs.LG cond-mat.mtrl-sci 版本更新

PLaID++: A Preference Aligned Language Model for Targeted Inorganic Materials Design

PLaID++: 一种用于定向无机材料设计的偏好对齐语言模型

Andy Xu, Rohan Desai, Larry Wang, Ethan Ritz, Gabriel Hope

发表机构 * University of California, Berkeley(加州大学伯克利分校) Stanford University(斯坦福大学)

AI总结 提出PLaID++,通过对称性感知的Wyckoff文本表示和温度缩放熵正则化,结合可验证奖励的强化学习,实现稳定、新颖且满足空间群属性的晶体生成,比先前方法效率提高约50%。

详情
Comments
Code available at https://github.com/andaero/PLaID, model weights at https://huggingface.co/HOPE-Lab-HMC/PLaID
AI中文摘要

基于可验证奖励的强化学习(RLVR)已成为提高LLM正确性的有前景方法,然而在许多科学问题中,目标并非产生正确答案,而是产生满足一组约束的多样化候选方案。我们在材料生成背景下研究这一挑战。为此,我们引入了PLaID++,一个经过后训练的LLM,用于稳定且属性引导的晶体生成。我们发现性能取决于我们的晶体学表示和奖励公式。首先,我们引入了一种紧凑的、对称性感知的Wyckoff文本表示,提高了计算效率并鼓励从物理先验中泛化。其次,我们证明了温度缩放作为熵正则化器,可以抵消模式坍塌并鼓励探索。通过将对称性约束直接编码到文本中,并将模型输出引导至理想的化学空间,PLaID++生成热力学稳定、独特且新颖的结构,其速率比先前方法高约50%,并能条件性地生成具有所需空间群属性的结构。我们的工作展示了将自然语言处理中的后训练技术适应于材料设计的潜力,为定向和高效发现新材料铺平了道路。

英文摘要

Reinforcement Learning from Verifiable Rewards (RLVR) has emerged as a promising approach to improve correctness in LLMs, however, in many scientific problems, the objective is not necessarily to produce the correct answer, but instead to produce a diverse array of candidates which satisfy a set of constraints. We study this challenge in the context of materials generation. To this end, we introduce PLaID++, an LLM post-trained for stable and property-guided crystal generation. We find that performance hinges on our crystallographic representation and reward formulation. First, we introduce a compact, symmetry-informed Wyckoff text representation which improves computational efficiency and encourages generalization from physical priors. Second, we demonstrate that temperature scaling acts as an entropy regularizer which counteracts mode collapse and encourages exploration. By encoding symmetry constraints directly into text and guiding model outputs towards desirable chemical space, PLaID++ generates structures that are thermodynamically stable, unique, and novel at a $\sim$50\% greater rate than prior methods and conditionally generates structures with desired space group properties. Our work demonstrates the potential of adapting post-training techniques from natural language processing to materials design, paving the way for targeted and efficient discovery of novel materials.

2511.23030 2026-06-12 cs.RO cs.CV 版本更新

DiskChunGS: Large-Scale 3D Gaussian SLAM Through Chunk-Based Memory Management

DiskChunGS:基于分块内存管理的大规模3D高斯SLAM

Casimir Feldmann, Maximum Wilder-Smith, Vaishakh Patil, Michael Oechsle, Michael Niemeyer, Keisuke Tateno, Marco Hutter

发表机构 * Robotic Systems Lab, ETH Zurich(机器人系统实验室,瑞士苏黎世联邦理工学院) Google(谷歌)

AI总结 提出DiskChunGS,通过将场景划分为空间块并将非活跃区域存储于磁盘,突破GPU内存限制,实现大规模3D高斯SLAM,在多个数据集上完成全序列重建并提升视觉质量。

详情
Journal ref
IEEE Robotics and Automation Letters, vol. 11, no. 4, 2026
AI中文摘要

近期3D高斯溅射(3DGS)的进展在实时渲染的新视角合成中展现了令人印象深刻的结果。然而,将3DGS与SLAM系统集成面临根本的可扩展性限制:方法受限于GPU内存容量,只能重建小规模环境。我们提出DiskChunGS,一种可扩展的3DGS SLAM系统,通过一种外核方法克服这一瓶颈,该方法将场景划分为空间块,并在GPU内存中仅维护活跃区域,同时将非活跃区域存储在磁盘上。我们的架构与现有的用于位姿估计和闭环检测的SLAM框架无缝集成,实现大规模全局一致的重建。我们在室内场景(Replica、TUM-RGBD)、城市驾驶场景(KITTI)以及资源受限的Nvidia Jetson平台上验证了DiskChunGS。我们的方法独特地完成了所有11个KITTI序列,没有出现内存故障,同时实现了卓越的视觉质量,证明了算法创新可以克服先前限制3DGS SLAM方法的内存约束。

英文摘要

Recent advances in 3D Gaussian Splatting (3DGS) have demonstrated impressive results for novel view synthesis with real-time rendering capabilities. However, integrating 3DGS with SLAM systems faces a fundamental scalability limitation: methods are constrained by GPU memory capacity, restricting reconstruction to small-scale environments. We present DiskChunGS, a scalable 3DGS SLAM system that overcomes this bottleneck through an out-of-core approach that partitions scenes into spatial chunks and maintains only active regions in GPU memory while storing inactive areas on disk. Our architecture integrates seamlessly with existing SLAM frameworks for pose estimation and loop closure, enabling globally consistent reconstruction at scale. We validate DiskChunGS on indoor scenes (Replica, TUM-RGBD), urban driving scenarios (KITTI), and resource-constrained Nvidia Jetson platforms. Our method uniquely completes all 11 KITTI sequences without memory failures while achieving superior visual quality, demonstrating that algorithmic innovation can overcome the memory constraints that have limited previous 3DGS SLAM methods.

2510.16928 2026-06-12 cs.CL 版本更新

ChiKhaPo: A Large-Scale Multilingual Benchmark for Evaluating Lexical Comprehension and Generation in Large Language Models

ChiKhaPo: 一个用于评估大型语言模型词汇理解与生成能力的大规模多语言基准

Emily Chang, Niyati Bafna

发表机构 * Toyota Technological Institute at Chicago(芝加哥丰田技术研究所) Johns Hopkins University, Center for Language and Speech Processing(约翰霍普金斯大学语言与语音处理中心)

AI总结 针对现有基准语言覆盖不足且侧重高阶任务的问题,提出ChiKhaPo基准,包含8个子任务,覆盖2700+种语言,评估LLM的词汇理解与生成能力,发现6个SOTA模型表现不佳。

详情
AI中文摘要

现有的大型语言模型(LLM)基准主要局限于高资源或中资源语言,并且通常评估推理和生成方面的高阶任务性能。然而,大量证据表明,LLM在全球3800多种书面语言中的绝大多数语言中缺乏基本的语言能力。我们引入了ChiKhaPo,它包含8个难度不同的子任务,旨在评估生成模型的词汇理解和生成能力。ChiKhaPo利用现有的词典、单语数据和双语文本,为2个子任务提供了2700多种语言的覆盖,在语言覆盖范围上超过了任何现有基准。我们进一步展示了6个SOTA模型在我们的基准上表现不佳,并讨论了影响性能分数的因素,包括语系、语言资源丰富度、任务以及理解与生成方向。通过ChiKhaPo,我们希望促进并鼓励对LLM进行大规模多语言基准测试。

英文摘要

Existing benchmarks for large language models (LLMs) are largely restricted to high- or mid-resource languages, and often evaluate performance on higher-order tasks in reasoning and generation. However, plenty of evidence points to the fact that LLMs lack basic linguistic competence in the vast majority of the world's 3800+ written languages. We introduce ChiKhaPo, consisting of 8 subtasks of varying difficulty designed to evaluate the lexical comprehension and generation abilities of generative models. ChiKhaPo draws on existing lexicons, monolingual data, and bitext, and provides coverage for 2700+ languages for 2 subtasks, surpassing any existing benchmark in terms of language coverage. We further show that 6 SOTA models struggle on our benchmark, and discuss the factors contributing to performance scores, including language family, language resourcedness, task, and comprehension versus generation directions. With ChiKhaPo, we hope to enable and encourage the massively multilingual benchmarking of LLMs.