arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2209.01378 2026-06-18 cs.LG eess.SP q-fin.ST 版本更新

通过文本反向传播的自进化多智能体系统

Xiaowen Ma, Yunpu Ma, Chenyang Lin, Sikuan Yan, Jinhe Bi, Zixuan Cao, Yijun Tian, Volker Tresp, Hinrich Schuetze

发表机构 * Ludwig Maximilian University of Munich（慕尼黑路德维希-马克西米利安大学）； Technical University of Munich（慕尼黑技术大学）； Munich Center for Machine Learning（慕尼黑机器学习中心）； University of Notre Dame（诺丁汉大学）

AI总结提出Agentic Neural Network框架，将多智能体协作建模为分层神经网络，通过前向分解任务和反向传播反馈实现智能体角色、提示和协作的自进化，在七个基准数据集上超越现有方法。

详情

AI中文摘要

利用多个大型语言模型（LLM）已被证明对处理复杂、高维任务有效，但当前方法通常依赖静态、手动设计的多智能体配置。为克服这些限制，我们提出Agentic Neural Network（ANN）框架，该框架将多智能体协作概念化为分层神经网络架构。在此设计中，每个智能体作为节点运行，每一层形成一个专注于特定子任务的协作团队。我们的框架遵循两阶段优化策略：（1）前向阶段——受神经网络前向传播启发，任务被动态分解为子任务，并逐层构建具有合适聚合方法的协作智能体团队。（2）反向阶段——模仿反向传播，我们通过迭代反馈优化全局和局部协作，使智能体能够自进化其角色、提示和协调。这种神经符号方法使我们的框架能够在训练后创建新的或专门的智能体团队，在准确性和适应性方面带来显著提升。在七个基准数据集上，我们的工作在相同配置下超越了领先的多智能体基线，显示出持续的性能改进。

英文摘要

Leveraging multiple Large Language Models (LLMs) has proven effective for addressing complex, high-dimensional tasks, but current approaches often rely on static, manually engineered multi-agent configurations. To overcome these constraints, we present the Agentic Neural Network (ANN), a framework that conceptualizes multi-agent collaboration as a layered neural network architecture. In this design, each agent operates as a node, and each layer forms a cooperative team focused on a specific subtask. Our framework follows a two-phase optimization strategy: (1) Forward Phase - Drawing inspiration from neural network forward passes, tasks are dynamically decomposed into subtasks, and cooperative agent teams with suitable aggregation methods are constructed layer by layer. (2) Backward Phase - Mirroring backpropagation, we refine both global and local collaboration through iterative feedback, allowing agents to self-evolve their roles, prompts, and coordination. This neuro-symbolic approach enables our framework to create new or specialized agent teams post-training, delivering notable gains in accuracy and adaptability. Across seven benchmark datasets, our work surpasses leading multi-agent baselines under the same configurations, showing consistent performance improvements.

URL PDF HTML ☆

赞 0 踩 0

2507.01414 2026-06-18 cs.LG 版本更新

Decomposing Prediction Mechanisms for In-Context Recall

分解上下文召回中的预测机制

Sultan Daniels, Dylan Davis, Dhruv Gautam, Wentinn Liao, Gireeja Ranade, Anant Sahai

发表机构 * University of California, Berkeley（加州大学伯克利分校）； University of Pennsylvania（宾夕法尼亚大学）

AI总结通过设计结合连续上下文学习与离散关联召回的新玩具问题，发现Transformer模型在上下文召回任务中存在两种具有不同学习动态的独立机制：一种依赖离散符号标签进行关联召回，另一种基于前一个token和上下文进行贝叶斯式预测。

Comments 45 pages, 47 figures, 2 tables

详情

AI中文摘要

我们引入了一类新的玩具问题，将线性回归风格的连续上下文学习（ICL）特征与离散关联召回相结合。我们在该玩具的样本轨迹上预训练Transformer模型，具体是从随机抽取的线性确定性动力系统中提取的符号标记交错状态观测。我们研究当模型被提示使用相应的上下文标签时，是否能够召回先前在其上下文中见过的序列的状态。仔细观察这个任务，很明显模型必须执行两个功能：（1）识别应召回哪个系统的状态，并将该系统应用于其最后看到的状态；（2）继续应用正确的系统来预测后续状态。训练动态表明，第一个能力在模型训练中后期才出现。令人惊讶的是，第二个能力（继续预测恢复的序列）发展得更早。通过分布外实验和通过边缘剪枝对模型权重的机制分析，我们发现这个玩具问题的下一个token预测涉及至少两个独立的机制。一种机制使用离散符号标签进行关联召回，以预测先前见过的序列恢复的开始。第二种机制在很大程度上与离散符号标签无关，基于前一个token和上下文进行“贝叶斯式”预测。这两种机制具有不同的学习动态。为了确认这种多机制现象（表现为不同的相变）不仅仅是玩具设置的人为产物，我们使用OLMo在ICL翻译任务上的训练检查点观察到了类似的现象：第一个任务token的性能与第二个任务token的性能出现决定性差距。

英文摘要

We introduce a new family of toy problems that combine features of linear-regression-style continuous in-context learning (ICL) with discrete associative recall. We pretrain transformer models on sample traces from this toy, specifically symbolically-labeled interleaved state observations from randomly drawn linear deterministic dynamical systems. We study if the transformer models can recall the state of a sequence previously seen in its context when prompted to do so with the corresponding in-context label. Taking a closer look at this task, it becomes clear that the model must perform two functions: (1) identify which system's state should be recalled and apply that system to its last seen state, and (2) continuing to apply the correct system to predict the subsequent states. Training dynamics reveal that the first capability emerges well into a model's training. Surprisingly, the second capability, of continuing the prediction of a resumed sequence, develops much earlier. Via out-of-distribution experiments, and a mechanistic analysis on model weights via edge pruning, we find that next-token prediction for this toy problem involves at least two separate mechanisms. One mechanism uses the discrete symbolic labels to do the associative recall required to predict the start of a resumption of a previously seen sequence. The second mechanism, which is largely agnostic to the discrete symbolic labels, performs a "Bayesian-style" prediction based on the previous token and the context. These two mechanisms have different learning dynamics. To confirm that this multi-mechanism (manifesting as separate phase transitions) phenomenon is not just an artifact of our toy setting, we used OLMo training checkpoints on an ICL translation task to see a similar phenomenon: a decisive gap in the emergence of first-task-token performance vs second-task-token performance.

URL PDF HTML ☆

赞 0 踩 0

2601.14968 2026-06-18 cs.LG cs.AI 版本更新

超越相似性：时间序列分析中的时序操作注意力

Jevon Twitty, Vinh Pham, Nitiwith Rotchanarak, Viresh Pati, Yubin Kim, Shihao Yang, Jiecheng Lu

发表机构 * Georgia Institute of Technology（佐治亚理工学院）

AI总结本文提出时序操作注意力（TOA），通过引入可学习的操作符增强注意力机制，以更有效地处理时间序列数据中的符号和振荡变换，提升时间序列预测、异常检测和分类任务的性能。

详情

AI中文摘要

时间序列预测中存在一个持久性悖论：结构简单的MLP和线性模型往往优于高容量的Transformer。我们指出，这种差距源于序列建模基本原理的不匹配：尽管许多时间序列动态由全局时间操作符（如滤波和谐波结构）主导，标准注意力将每个输出视为输入的凸组合。这限制了其表示带符号和振荡变换的能力，这些能力对于时间信号处理至关重要。我们正式将这一限制定义为softmax注意力中的简单约束混合瓶颈，这对由操作符驱动的时间序列任务尤其限制性。为了解决这一问题，我们提出时序操作注意力（TOA），一种通过显式、可学习的序列空间操作符增强注意力的框架，使时间内的符号混合成为可能，同时保持输入依赖的适应性。为了使密集的N×N操作符实用化，我们引入了随机操作符正则化，一种高方差的dropout机制，它稳定了训练并防止了记忆性学习。在预测、异常检测和分类基准上，TOA在集成到标准骨干如PatchTST和iTransformer时始终提高了性能，尤其是在重建密集任务中表现尤为突出。这些结果表明，显式操作符学习是有效时间序列建模的关键要素。

英文摘要

A persistent paradox in time-series forecasting is that structurally simple MLP and linear models often outperform high-capacity Transformers. We argue that this gap arises from a mismatch in the sequence-modeling primitive: while many time-series dynamics are governed by global temporal operators (e.g., filtering and harmonic structure), standard attention forms each output as a convex combination of inputs. This restricts its ability to represent signed and oscillatory transformations that are fundamental to temporal signal processing. We formalize this limitation as a simplex-constrained mixing bottleneck in softmax attention, which becomes especially restrictive for operator-driven time-series tasks. To address this, we propose $\textbf{Temporal Operator Attention (TOA)}$, a framework that augments attention with explicit, learnable sequence-space operators, enabling direct signed mixing across time while preserving input-dependent adaptivity. To make dense $N \times N$ operators practical, we introduce Stochastic Operator Regularization, a high-variance dropout mechanism that stabilizes training and prevents trivial memorization. Across forecasting, anomaly detection, and classification benchmarks, TOA consistently improves performance when integrated into standard backbones such as PatchTST and iTransformer, with particularly strong gains in reconstruction-heavy tasks. These results suggest that explicit operator learning is a key ingredient for effective time-series modeling.

URL PDF HTML ☆

赞 0 踩 0

2606.01249 2026-06-18 cs.LG cs.CL 版本更新

Trust Region On-Policy Distillation

信任区域在线策略蒸馏

Xingrun Xing, Haoqing Wang, Boyan Gao, Ziheng Li, Yehui Tang

发表机构 * Samsung Research（三星研究院）； University of Oxford（牛津大学）； Peking University（北京大学）

AI总结提出信任区域在线策略蒸馏（TrOPD），通过信用分配策略和信任区域学习解决师生分布差异导致的训练不稳定问题，在数学推理、代码生成和通用基准上超越现有方法。

详情

AI中文摘要

在线策略蒸馏（OPD）是大型语言模型（LLM）高效后训练的基本技术，在智能体学习、多任务增强和模型压缩中具有广泛应用。然而，当教师和学生分布差异较大时，OPD训练变得不稳定，因为教师对学生生成token的监督可能产生不可靠的策略梯度，甚至导致优化失败。本文通过信用分配策略解决可靠的在线策略token级监督问题，并提出信任区域在线策略蒸馏（TrOPD）。它具有以下特点：1）信任区域在线策略学习：TrOPD仅在教师提供可靠监督的区域进行OPD，缓解了分布不匹配下K1反向KL估计的优化困难。2）异常值估计：对于异常区域，我们探索梯度裁剪、掩码和前向KL估计，以减少不可靠监督的不利影响。3）离策略引导：学生从教师前缀继续生成，并使用前向KL模仿离策略引导，鼓励向可靠区域进行在线策略探索。实验表明，TrOPD在数学推理、代码生成和通用领域基准上始终优于最先进的OPD基线，包括OPD、EOPD和REOPOLD。

英文摘要

On-Policy Distillation (OPD) is a fundamental technique for efficient post-training of large language models (LLMs), with broad applications in agent learning, multi-task enhancement, and model compression. However, OPD training becomes unstable when the teacher and student distributions differ substantially, as teacher supervision on student-generated tokens may yield unreliable policy gradients and even cause optimization failure. This work addresses reliable on-policy token-level supervision through credit assignment strategies, and proposes Trust Region On-Policy Distillation, TrOPD. It features the following characteristics: 1) Trust-Region On-Policy Learning: TrOPD performs OPD only in regions where the teacher provides reliable supervision, mitigating the optimization difficulty of the K1 reverse-KL estimator under distribution mismatch. 2) Outlier Estimation: For outlier regions, we explore gradient clipping, masking, and forward-KL estimation to reduce the adverse effects of unreliable supervision. 3) Off-Policy Guidance: The student continues generation from teacher prefixes and uses forward KL to imitate off-policy guidance, encouraging on-policy exploration toward reliable regions. Experiments show that TrOPD consistently outperforms SoTA OPD baselines, including OPD, EOPD, and REOPOLD, across mathematical reasoning, code generation, and general-domain benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2606.06564 2026-06-18 cs.LG cs.AI 版本更新

HAARES Half-Split Residual Basis Routing for Deep Transformers

WAV：面向深度仅解码器Transformer的多分辨率块残差路由

Kehan Wang

发表机构 * Chongqing University（重庆大学）

AI总结提出WAV v1方法，通过为每个块增加方向性细节基（相位基和分裂基）来增强残差路由，在深层Transformer中优于现有方法，48层时在TinyStories和Text8上取得更低验证损失。

Comments 6 pages, 4 figures, 3 tables

详情

AI中文摘要

残差连接对于训练深度Transformer至关重要，但标准的PreNorm残差流以固定的单位权重聚合子层更新。最近的注意力残差用内容相关的深度路由替代了这种固定累积，而块注意力残差通过对块级残差摘要进行路由使机制高效。然而，单个块摘要仅存储块内的低频总残差位移，丢弃了方向性结构，例如注意力与MLP的不平衡以及早期与晚期块的动态。我们提出WAV v1，一种用于仅解码器Transformer的轻量级多分辨率残差路由方法。WAV v1不是仅通过累积残差和来表示每个块，而是为每个块增加两个方向性细节基：一个对比注意力和MLP更新的相位基，以及一个对比早期和晚期子层更新的分裂基。这些基与标准块摘要一起通过相同的深度softmax混合器进行路由，而负细节源初始化和分离的RMS匹配稳定了训练。在字符级TinyStories和Text8语言建模中，WAV v1显示出明显的深度相关优势。尽管在12层时并非始终有益，但在24层时变得有竞争力，并在48层时优于所有基线。在48层时，WAV v1将TinyStories上的验证损失从0.4960降至0.4738，Text8上从0.9363降至0.9305，且额外参数可忽略。这些结果表明，方向性残差细节（而不仅仅是块级和）对于在更深Transformer中扩展残差路由很重要。

英文摘要

Block-level residual routing makes learned residual aggregation practical by routing over block summaries, but each summary compresses an ordered sequence of attention and MLP updates into one cumulative vector. We propose \method{}, a lightweight residual basis router that keeps the cumulative block source and adds one half-split detail basis, computed as the difference between first-half and second-half residual updates. The detail basis is RMS-matched and updated online, exposing coarse intra-block trajectory information without dense sublayer-level routing. Across OpenWebText, cross-domain character-level benchmarks, and BPE-tokenized OpenWebText, the empirical pattern is depth-dependent: gains are small or mixed at shallow depth and most reliable in 48-layer models. In the 201M 48-layer setting, \method{} improves over Block AttnRes across all three seeds, while a 453M two-seed probe shows the same direction. Ablations rule out source duplication, random signed details, fixed detail-source biases, or block-count changes alone. Cost analysis shows that the method is FLOP-light but not wall-clock-free: it adds memory and routing overhead, yet its relative arithmetic cost is amortized as width grows and earlier convergence can reduce time-to-target.

URL PDF HTML ☆

赞 0 踩 0

2606.02800 2026-06-18 cs.CV cs.AI cs.LG cs.MM cs.RO 版本更新

Cosmos 3: Omnimodal World Models for Physical AI

Cosmos 3：面向物理AI的全模态世界模型

NVIDIA, :, Aditi, Niket Agarwal, Arslan Ali, Jon Allen, Martin Antolini, Adeline Aubame, Alisson Azzolini, Junjie Bai, Maciej Bala, Yogesh Balaji, Josh Bapst, Aarti Basant, Mukesh Beladiya, Mohammad Qazim Bhat, Zaid Pervaiz Bhat, Dan Blick, Vanni Brighella, Han Cai, Tiffany Cai, Eric Cameracci, Jiaxin Cao, Yulong Cao, Mark Carlson, Carlos Casanova, Ting-Yun Chang, Yan Chang, Yu-Wei Chao, Prithvijit Chattopadhyay, Roshan Chaudhari, Chieh-Yun Chen, Junyu Chen, Ke Chen, Qizhi Chen, Wenkai Chen, Xiaotong Chen, Yu Chen, An-Chieh Cheng, Click Cheng, Xiu Chia, Jeana Choi, Chaeyeon Chung, Wenyan Cong, Yin Cui, Magdalena Dadela, Nalin Dadhich, Wenliang Dai, Joyjit Daw, Alperen Degirmenci, Rodrigo Vieira Del Monte, Robert Denomme, Sameer Dharur, Marco Di Lucca, Ke Ding, Wenhao Ding, Yifan Ding, Yuzhu Dong, Nicole Drumheller, Yilun Du, Aigul Dzhumamuratova, Aleksandr Efitorov, Hamid Eghbalzadeh, Naomi Eigbe, Imad El Hanafi, Hassan Eslami, Benedikt Falk, Jiaojiao Fan, Jim Fan, Amol Fasale, Sergiy Fefilatyev, Liang Feng, Francesco Ferroni, Sanja Fidler, Xiao Fu, Vikram Fugro, Prashant Gaikwad, TJ Galda, Katelyn Gao, Yihuai Gao, Wenhang Ge, Sreyan Ghosh, Arushi Goel, Vivek Goel, Akash Gokul, Rama Govindaraju, Jinwei Gu, Miguel Guerrero, Elfie Guo, Aryaman Gupta, Siddharth Gururani, Hugo Hadfield, Song Han, Ankur Handa, Zekun Hao, Mohammad Harrim, Ali Hassani, Nathan Hayes-Roth, Yufan He, Chris Helvig, Cyrus Hogg, Madison Huang, Michael Huang, Sophia Huang, Yufan Huang, Jacob Huffman, DeLesley Hutchins, Suneel Indupuru, Boris Ivanovic, Arihant Jain, Joel Jang, Ryan Ji, Yanan Jian, Dongfu Jiang, Jingyi Jin, Atharva Joshi, Nikhilesh Joshi, Pranjali Joshi, Andy Ju, Jaehun Jung, Weiwei Kang, Scott Kassekert, Jan Kautz, Ashna Khetan, Julia Kiczka, Slawek Kierat, Gwanghyun Kim, Kuno Kim, Sunny Kim, Kezhi Kong, Xin Kong, Zhifeng Kong, Tomasz Kornuta, Egor Krivov, Hui Kuang, Saurav Kumar, Chia-Wen Kuo, George Kurian, Wojciech Kutak, JF Lafleche, Himangshu Lahkar, Omar Laymoun, Jayjun Lee, Sanggil Lee, Gabriele Leone, Boyi Li, Freya Li, Jiajun Li, Jinfeng Li, Ling Li, Pengcheng Li, Shangru Li, Tingle Li, Xiaolong Li, Xuan Li, Zhaoshuo Li, Zhiqi Li, Hao Liang, Maosheng Liao, Chen-Hsuan Lin, Tsung-Yi Lin, Ming-Yu Liu, Sifei Liu, Zihan Liu, Hai Loc Lu, Xiangyu Lu, Alice Luo, Ruipu Luo, Wenjie Luo, Jiangran Lyu, Martin Ding Ma, Nic Ma, Qianli Ma, Dawid Majchrowski, Louis Marcoux, Miguel Martin, Qing Miao, Ashkan Mirzaei, Shreyas Misra, Kaichun Mo, Durra Mohsin, Hyejin Moon, Pawel Morkisz, Saeid Motiian, Kirill Motkov, Seungjun Nah, Yashraj Narang, Deepak Narayanan, Thabang Ngazimbi, Julian Ouyang, Shubham Pachori, David Page, Yatian Pang, Sehwi Park, Mahesh Patekar, Mostofa Patwary, Marco Pavone, Trung Pham, Wei Ping, Soha Pouya, Shrimai Prabhumoye, Varun Praveen, Delin Qu, Hesam Rabeti, Morteza Ramezanali, Marilyn Reeb, Xuanchi Ren, Kristen Rumley, Wojciech Rymer, Jun Saito, Yeongho Seol, John Shao, Piyush Shekdar, Tianwei Shen, Humphrey Shi, Min Shi, Stella Shi, Kevin Shih, Mohammad Shoeybi, Mateusz Sieniawski, Shuran Song, Alexander Sotelo, Amir Sotoodeh, Sunil Srinivasa, Vignesh Srinivasakumar, Bartosz Stefaniak, Rahul Heinrich Steiger, Shangkun Sun, Jiaxiang Tang, Shitao Tang, Yangyang Tang, Yue Tang, Tolou Tavakkoli, Kayley Ting, Krzysztof Tomala, Wei-Cheng Tseng, Jibin Varghese, Sergei Vasilev, Thomas Volk, Raju Wagwani, Roger Waleffe, Andrew Z. Wang, Boxiang Wang, Haoxiang Wang, Qiao Wang, Shihao Wang, Shijie Wang, Ting-Chun Wang, Yan Wang, Yu Wang, Rohit Watve, David Wehr, Fangyin Wei, Xinshuo Weng, Jay Zhangjie Wu, Kedi Wu, Hongchi Xia, Summer Xiao, Tianjun Xiao, Kevin Xie, Daguang Xu, Jiashu Xu, Mengyao Xu, Ruqing Xu, Xingqian Xu, Yao Xu, Dinghao Yang, Dong Yang, Hans Yang, Xiaodong Yang, Xuning Yang, Yichu Yang, Yurong You, Zhiding Yu, Hao Yuan, Simon Yuen, Xiaohui Zeng, Pengcuo Zeren, Cindy Zha, Haotian Zhang, Jenny Zhang, Jing Zhang, Liangkai Zhang, Paris Zhang, Shun Zhang, Xuanmeng Zhang, Zhizheng Zhang, Ann Zhao, Yilin Zhao, Yuliya Zhautouskaya, Charles Zhou, Fengzhe Zhou, Shilin Zhu, Yuke Zhu, Dima Zhylko, Artur Zolkowski

发表机构 * NVIDIA

AI总结提出基于统一混合Transformer架构的全模态世界模型Cosmos 3，联合处理语言、图像、视频、音频和动作序列，在理解和生成任务上达到新最优，为具身智能体提供可扩展的通用骨干。

详情

AI中文摘要

我们介绍了Cosmos 3，一个全模态世界模型家族，设计用于在统一的混合Transformer架构中联合处理和生成语言、图像、视频、音频和动作序列。通过支持高度灵活的输入输出配置，Cosmos 3无缝统一了物理AI的关键模态——有效地将视觉语言模型、视频生成器、世界模拟器和世界动作模型整合到一个框架中。我们的评估表明，Cosmos 3在一系列多样化的理解和生成任务中确立了新的最优水平，展示了全模态世界模型作为具身智能体可扩展、通用骨干的能力。我们的后训练Cosmos 3模型在技术报告撰写时被Artificial Analysis评为最佳开源文本到图像和图像到视频模型，并被RoboArena评为最佳策略模型。为了加速物理AI领域的开放研究和部署，我们在Linux基金会的OpenMDW-1.1许可证下提供我们的代码、模型检查点、策划的合成数据集和评估基准，网址为https://this https URL License at this https URL }{ this http URL and this https URL。项目网站位于https://this https URL。

英文摘要

We introduce Cosmos 3, a family of omnimodal world models designed to jointly process and generate language, image, video, audio, and action sequences within a unified mixture-of-transformers architecture. By supporting highly flexible input-output configurations, Cosmos 3 seamlessly unifies critical modalities for Physical AI -- effectively subsuming vision-language models, video generators, world simulators, and world-action models into a single framework. Our evaluation demonstrates that Cosmos 3 establishes a new state-of-the-art across a diverse suite of understanding and generation tasks, demonstrating omnimodal world models as scalable, general-purpose backbones for embodied agents. Our post-trained Cosmos 3 models were ranked as the best open-source Text-to-Image and Image-to-Video models by Artificial Analysis, and the best policy model by RoboArena at the time the technical report was written. To accelerate open research and deployment in Physical AI, we make our code, model checkpoints, curated synthetic datasets, and evaluation benchmark available under the Linux Foundation's OpenMDW-1.1 License at https://github.com/nvidia/cosmos and https://huggingface.co/collections/nvidia/cosmos3. The project website is available at https://research.nvidia.com/labs/cosmos-lab/cosmos3.

URL PDF HTML ☆

赞 0 踩 0

2406.07775 2026-06-18 cs.LG 版本更新

端到端自动驾驶中的零样本跨城市泛化：自监督与监督表示

Fatemeh Naeinian, Ali Hamza, Haoran Zhu, Anna Choromanska

发表机构 * Department of Electrical and Computer Engineering, NYU Tandon School of Engineering（电气工程系，纽约大学Tandon工程学院）

AI总结研究端到端自动驾驶模型在跨城市零样本迁移中的泛化能力，发现自监督预训练（如I-JEPA、DINOv2、MAE）相比监督预训练能显著减少位移和碰撞退化，提升闭环评估中的分布外PDMS。

详情

AI中文摘要

端到端自动驾驶模型通常使用监督的ImageNet预训练骨干网络在多城市数据集上训练，但其泛化到未见城市的能力尚未得到充分检验。当训练和评估数据在地理上混合时，模型可能隐含地依赖城市特定线索，掩盖了在真实世界域偏移下泛化到新位置时可能出现的失败模式。在这项工作中，我们将零样本跨城市迁移定义为端到端自动驾驶的受控表示级压力测试，并探究视觉预训练如何影响地理域偏移下的迁移行为。我们通过将自监督骨干网络I-JEPA、DINOv2和MAE集成到规划框架中进行了全面研究。我们在nuScenes上的开环设置和NAVSIM上的闭环评估协议中，在严格的地理划分下评估性能。我们的实验揭示了当模型在不同道路拓扑、交通规则和视觉环境的城市间迁移时存在显著的泛化差距。在开环评估中，监督骨干网络在城市间迁移时表现出严重退化，而某些领域特定的自监督方法可以显著减少位移和碰撞退化。在闭环评估中，自监督预训练在多个单城市训练设置中提高了平均分布外PDMS。我们的结果提供了经验证据，表明表示学习影响跨城市规划的鲁棒性，并促使将零样本地理迁移作为评估端到端自动驾驶系统的重要压力测试。

英文摘要

End-to-end autonomous driving models are typically trained on multi-city datasets using supervised ImageNet-pretrained backbones, yet their ability to generalize to unseen cities remains largely unexamined. When training and evaluation data are geographically mixed, models may implicitly rely on city-specific cues, masking failure modes that would occur under real-world domain shifts when generalizing to new locations. In this work, we formulate zero-shot cross-city transfer as a controlled representation-level stress test for end-to-end autonomous driving and ask how visual pretraining affects transfer behavior under geographic domain shift. We conduct a comprehensive study by integrating self-supervised backbones I-JEPA, DINOv2, and MAE into planning frameworks. We evaluate performance under strict geographic splits on nuScenes in the open-loop setting and on NAVSIM in the closed-loop evaluation protocol. Our experiments reveal a substantial generalization gap when transferring models across cities with different road topologies, traffic conventions, and visual environments. In open-loop evaluation, a supervised backbone exhibits severe degradation when transferring between cities, yet some domain-specific self-supervised methods can substantially reduce both displacement and collision degradation. In closed-loop evaluation, self-supervised pretraining improves average out-of-distribution PDMS in several single-city training settings. Our results provide empirical evidence that representation learning influences the robustness of cross-city planning and motivate zero-shot geographic transfer as an important stress test for evaluating end-to-end autonomous driving systems.

URL PDF HTML ☆

赞 0 踩 0

2507.17786 2026-06-18 cs.LG 版本更新

Reinforcement Learning for Accelerated Aerodynamic Shape Optimisation

强化学习加速气动外形优化

Florian Sobieczky, Alfredo Lopez, Erika Dudkin, Christopher Lackner, Matthias Hochsteger, Bernhard Scheichl, Helmut Sobieczky

发表机构 * Software Competence Center Hagenberg (SCCH)（软件竞争力中心哈根贝格）； Institut für Strömungsmechanik und Wärmeübertragung, TU Wien（流体力学与传热研究所，维也纳技术大学）； CERBSim GmbH（CERBSim公司）

AI总结提出基于强化学习的自适应优化算法，通过代理模型和演员-评论家策略评估的MCMC方法，冻结部分参数以降低维度，加速气动外形优化，并在简单流体动力学问题上验证了特征重要性解释能力。

详情

AI中文摘要

我们引入了一种基于强化学习（RL）的自适应优化算法，用于气动外形优化，重点关注降维。这里应用RL的形式是一种基于代理的、演员-评论家策略评估的MCMC方法，允许对部分待优化参数进行时间上的“冻结”。目标是尽量减少计算量，并利用观察到的优化结果来解释所发现的极值点在实现所需流场中的作用。通过围绕作为真实值的中间CFD模拟进行一系列局部优化的参数变化，如果（a）参数必须驻留的局部邻域足够大，能够与网格大小的步长及其大量模拟相竞争，并且（b）对这些邻域所需的奖励和成本估计足够准确，以实现良好的逐步参数自适应，则可以加速全局优化。我们给出了一个简单流体动力学问题的例子，在该问题上，该方法允许在特征重要性评分意义上进行解释。

英文摘要

We introduce a reinforcement learning (RL) based adaptive optimization algorithm for aerodynamic shape optimization focused on dimensionality reduction. The form in which RL is applied here is that of a surrogate-based, actor-critic policy evaluation MCMC approach allowing for temporal 'freezing' of some of the parameters to be optimized. The goals are to minimize computational effort, and to use the observed optimization results for interpretation of the discovered extrema in terms of their role in achieving the desired flow-field. By a sequence of local optimized parameter changes around intermediate CFD simulations acting as ground truth, it is possible to speed up the global optimization if (a) the local neighbourhoods of the parameters in which the changed parameters must reside are sufficiently large to compete with the grid-sized steps and its large number of simulations, and (b) the estimates of the rewards and costs on these neighbourhoods necessary for a good step-wise parameter adaption are sufficiently accurate. We give an example of a simple fluid-dynamical problem on which the method allows interpretation in the sense of a feature importance scoring.

URL PDF HTML ☆

赞 0 踩 0

2604.03208 2026-06-18 cs.LG 版本更新

Hierarchical Planning with Latent World Models

基于潜在世界模型的分层规划

Wancong Zhang, Basile Terver, Artem Zholus, Soham Chitnis, Harsh Sutaria, Mido Assran, Randall Balestriero, Amir Bar, Adrien Bardes, Yann LeCun, Nicolas Ballas

发表机构 * FAIR at Meta（Meta旗下的FAIR）； New York University（纽约大学）； Mila - Québec AI Institute（魁北克AI研究院）； Brown University（布朗大学）

AI总结提出HWM架构，通过多时间尺度潜在世界模型和潜在匹配实现分层模型预测控制，解决长时域任务中单层规划失败和计算爆炸问题。

详情

AI中文摘要

世界模型是通过规划实现零样本具身控制的一条有前景的路径。然而，现有的世界模型规划器在长时域、多阶段任务中面临困难：预测误差累积，且朴素搜索的复杂度随规划时域呈指数增长。分层方法通过将任务分解为更短、可处理的子问题来缓解这两个问题；然而，先前的分层方法要么将控制摊销为任务特定的策略（分层强化学习），要么假设低维状态和已知动力学（经典分层MPC）。我们提出了基于潜在世界模型的分层规划（HWM），这是一种直接在仅通过下一潜在预测训练的视觉世界模型上进行分层模型预测控制（MPC）的架构和规划范式。HWM在共享潜在空间内学习多个时间尺度的世界模型，因此长时域模型的预测通过潜在匹配作为短时域模型的子目标，无需任务特定的奖励、技能学习或分层策略。为了保持长时域搜索的可处理性，HWM学习了一个动作编码器，将原始动作块压缩为潜在宏动作。在真实世界的Franka操作中，HWM从单个目标图像中完成拾取和放置的成功率为70%，而单层规划的成功率为0%。在模拟的推操作和迷宫导航任务中，HWM在长时域任务上持续提升性能，同时所需规划计算量最多减少3倍。

英文摘要

World models are a promising path to zero-shot embodied control through planning. However, existing world model planners struggle on long-horizon, multi-stage tasks: prediction errors compound and naive search is exponential in the planning horizon. Hierarchy mitigates both by decomposing tasks into shorter, tractable subproblems; yet prior hierarchical approaches either amortize control into task-specific policies (hierarchical RL) or assume low-dimensional states and known dynamics (classical hierarchical MPC). We present Hierarchical Planning with Latent World Models (HWM), an architecture and planning paradigm for hierarchical model predictive control (MPC) directly on visual world models trained solely via next-latent prediction. HWM learns world models at multiple temporal scales within a shared latent space, so predictions from the long-horizon model serve as subgoals for the short-horizon model via latent matching, without task-specific rewards, skill learning, or hierarchical policies. To keep long-horizon search tractable, HWM learns an action encoder that compresses primitive action chunks into latent macro-actions. On real-world Franka manipulation, HWM solves pick-and-place from a single goal image at 70% success vs. 0% for single-level planning. Across simulated push manipulation and maze navigation, HWM consistently improves performance on long-horizon tasks while requiring up to 3x less planning compute.

URL PDF HTML ☆

赞 0 踩 0

2605.22142 2026-06-18 cs.LG cs.AI 版本更新

Short-Term-to-Long-Term Memory Transfer for Knowledge Graphs under Partial Observability

知识图谱下的短期到长期记忆转移：在部分可观测性下的短期到长期记忆转移

Taewoon Kim, Vincent François-Lavet, Michael Cochez

AI总结本文研究了在部分可观测性下知识图谱中的短期到长期记忆转移问题，提出了一种基于神经符号价值决策的方法，通过在长期插入前决定保留或丢弃观察到的三元组，从而提升记忆效率，并在RoomKG基准测试中优于符号和神经基线方法。

详情

AI中文摘要

在部分可观测性下的强化学习需要决定保留哪些信息，但大多数基于记忆的方法并未显式建模符号观察的短期到长期转移。我们研究了这一转移过程，将其建模为一个神经符号价值决策问题：对于每个观察到的三元组，智能体需决定在长期插入前是否保留或丢弃。为处理可变大小的短期缓冲区，我们采用了一种每项Q学习设计，使用共享参数和实际的时间差分更新，跨连续步骤匹配项目。在长期记忆容量为128的RoomKG基准测试中，学习到的转移决策优于符号和神经基线，包括带有时间注释的符号基线和基于历史的LSTM/Transformer基线。在转移策略消融分析中，一个轻量级的本地短期-only变体表现最佳，且在步骤层面行为显示，策略保留导航和查询相关的事实，同时丢弃低价值的候选事实，支持在内存限制下显式且可解释的记忆决策。

英文摘要

Reinforcement learning under partial observability requires deciding what information to retain, yet most memory-based approaches do not explicitly model short-term-to-long-term transfer of symbolic observations. We study this transfer process in a temporal knowledge-graph memory setting and cast it as a neuro-symbolic value-based decision problem: for each observed triple, the agent chooses whether to keep or drop it before long-term insertion. To handle variable-sized short-term buffers, we use a per-item Q-learning design with shared parameters and a practical temporal-difference update over matched items across consecutive steps. On the RoomKG benchmark at long-term memory capacity 128, learned transfer decisions outperform symbolic and neural baselines, including symbolic baselines with temporal annotations and history-based LSTM/Transformer baselines. Across transfer-policy ablations, a lightweight local short-term-only variant performs best, and step-level behavior shows that the policy keeps navigation- and query-relevant facts while discarding lower-value candidate facts, supporting explicit and interpretable memory decisions under memory constraints.

URL PDF HTML ☆

赞 0 踩 0

2606.12808 2026-06-18 cs.LG cs.AI 版本更新

SymQNet: Amortized Acquisition for Low-Latency Adaptive Hamiltonian Learning

SymQNet: 低延迟自适应哈密顿量学习的摊销获取

Yash Vardhan Tomar, Dheeraj Peddireddy

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出SymQNet，一种摊销强化学习方法，通过离线学习后验条件获取策略，在线快速前向传播，显著降低自适应哈密顿量学习的获取延迟。

详情

AI中文摘要

自适应哈密顿量学习对于校准和表征量子设备至关重要。在自适应控制器中，选择下一个实验本身就是一个计算。贝叶斯设计规则在每次后验更新后重新计算，这一步可能需要几秒钟。在数百次试验中，这些秒数成为自适应性的显著墙钟成本。我们引入SymQNet，一种用于低延迟自适应哈密顿量学习的摊销强化学习方法。SymQNet离线学习后验条件获取策略，然后在线使用快速策略前向传播，同时保留贝叶斯后验反馈。在横向场伊辛基准测试中，相对于有界Fisher信息搜索和有界两步贝叶斯主动学习（BALD），SymQNet显著降低了获取延迟。在五量子比特时，相对于这些在线基线，它仅获取决策延迟降低了$47.1\ imes$和$72.6\ imes$；在十二量子比特时，SymQNet的完整模拟步骤需要$1.02$秒，而有界两步BALD需要$13.27$秒。总体而言，我们表明学习获取可以使自适应哈密顿量学习对于重复的低延迟工作负载变得实用。

英文摘要

Adaptive Hamiltonian learning is central to calibrating and characterizing quantum devices. In an adaptive controller, choosing the next experiment is itself a computation. Bayesian design rules are recomputed after every posterior update, and that step can take seconds. Across hundreds of shots, those seconds become a significant wall-clock cost for adaptivity. We introduce SymQNet, an amortized reinforcement-learning approach for low-latency adaptive Hamiltonian learning. SymQNet learns a posterior-conditioned acquisition policy offline, then uses a fast policy forward pass online while retaining Bayesian posterior feedback. On transverse-field Ising benchmarks, SymQNet substantially reduces acquisition latency relative to bounded Fisher-information search and bounded two-step Bayesian active learning by disagreement (BALD). At five qubits, it reduces acquisition-only decision latency by $47.1\times$ and $72.6\times$ relative to these online baselines; at twelve qubits, full simulated steps take $1.02$ s for SymQNet versus $13.27$ s for bounded two-step BALD. Overall, we show that learned acquisition can make adaptive Hamiltonian learning practical for repeated low-latency workloads.

URL PDF HTML ☆

赞 0 踩 0

2511.00802 2026-06-18 cs.SE cs.CL cs.LG 版本更新

GrowthHacker: Automated Off-Policy Evaluation Optimization Using Code-Modifying LLM Agents

GrowthHacker: 使用代码修改型LLM代理的自动离线策略评估优化

Jie JW Wu, Ayanda Patrick Herlihy, Ahmad Saleem Mirza, Ali Afoud, Fatemeh Fard

发表机构 * Michigan Technological University, Houghton（密歇根技术大学）； Birmingham City University（伯明翰城市大学）； University of British Columbia, Kelowna（不列颠哥伦比亚大学, 肯洛纳）

AI总结提出GrowthHacker基准，利用LLM代理自动迭代修改代码以优化离线策略评估（OPE）实现，在Open Bandit Pipeline和Scope-RL上评估多种框架，证明基于LLM的代理可作为自动增长黑客持续改进OPE系统。

Comments Accepted for publication in ACM Transactions on Software Engineering and Methodology (TOSEM), 2026

详情

DOI: 10.1145/3815588

AI中文摘要

随着数据驱动开发的广泛采用，在线A/B测试已成为衡量新技术效果的既定方法。然而，部署在线实验需要设计、实现和部署资源，并可能对用户产生负面影响（例如，不安全或不道德的结果），同时需要数周的数据收集。为了解决这一问题，离线策略评估（OPE）或离线A/B测试这一日益增长的研究领域，使用先前收集的日志数据离线评估新技术。OPE也是强化学习中的一个基本问题，在在线测试昂贵或风险高的领域（如医疗保健、推荐系统、教育和机器人技术）中非常重要。尽管代码生成大语言模型（LLM）和代理工作流取得了进展，但关于LLM和基于LLM的代理是否以及如何自动优化OPE实现，我们知之甚少。我们提出了GrowthHacker，这是一个基准测试，用于在大规模公共数据集上评估基线LLM和基于LLM的代理。GrowthHacker自主迭代修改代码，运行OPE，并使用指标指导后续优化。我们在Open Bandit Pipeline（OBP）和Scope-RL上评估方法，并开发了一个双代理框架，该框架解决了现有框架的局限性，同时降低了复杂性。在两个库中，双代理显示出最高的可靠性（98.1%-100%成功率）和正向结果率（78%），正向结果的中位改进为4.4%；CrewAI实现了最高的平均改进（37.9%），并且是唯一没有极端值失败的框架。AutoGen和Default各达到65%的正向结果率。这些结果证明了使用基于LLM的代理作为自动“增长黑客”持续改进OPE系统的可行性，对在手动优化成本高昂的情况下扩展数据驱动决策具有重要意义。

英文摘要

With data-driven development now widely adopted, online A/B testing is an established method for measuring the effects of new technologies. However, deploying online experiments demands resources for design, implementation, and deployment, and may negatively impact users (e.g., unsafe or unethical outcomes) while requiring weeks of data collection. To address this, the growing research area of off-policy evaluation (OPE), or offline A/B testing, assesses new technologies offline using previously collected logged data. OPE is also a fundamental problem in reinforcement learning and is important where online testing is expensive or risky, such as healthcare, recommender systems, education, and robotics. Despite advances in code-generation large language models (LLMs) and agentic workflows, little is known about whether and how LLMs and LLM-based agents can automatically optimize OPE implementations. We propose GrowthHacker, a benchmark that evaluates baseline LLMs and LLM-based agents on large-scale public datasets. GrowthHacker autonomously and iteratively modifies code, runs OPE, and uses the metrics to guide subsequent optimization. We evaluate methods on Open Bandit Pipeline (OBP) and Scope-RL, and develop a two_agent framework that addresses limitations of existing frameworks while reducing complexity. Across both libraries, two_agent shows the highest reliability (98.1%-100% success rate) and positive-outcome rate (78%), with a median improvement of 4.4% among positive outcomes; CrewAI achieves the highest average improvement (37.9%) and is the only framework with zero extreme-value failures. AutoGen and Default each reach 65% positive-outcome rates. These results establish the feasibility of using LLM-based agents as automated "growth hackers" to continuously improve OPE systems, with implications for scaling data-driven decision-making where manual optimization is expensive.

URL PDF HTML ☆

赞 0 踩 0

2602.11467 2026-06-18 cs.LG 版本更新

PRISM: A 3D Probabilistic Neural Representation for Interpretable Shape Modeling

PRISM：一种用于可解释形状建模的三维概率神经表示

Yining Jiao, Sreekalyani Bhamidi, Carlton Jude Zdanski, Julia S Kimbell, Andrew Prince, Cameron P Worden, Samuel Kirse, Christopher Rutter, Benjamin H Shields, Jisan Mahmud, Marc Niethammer

发表机构 * Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, USA（北卡罗来纳大学教堂山分校计算机科学系）； Department of Computer Science, University of California San Diego, La Jolla, USA（加州大学圣地亚哥分校计算机科学系）； School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, USA（北卡罗来纳大学教堂山分校医学院）

AI总结提出PRISM框架，结合隐式神经表示与不确定性感知统计形状分析，通过封闭形式Fisher信息度量实现高效局部时间不确定性量化，在形状演化、个性化预测和异常检测任务中表现优异。

Comments ICML 2026, camera-ready version, 24 pages

详情

AI中文摘要

理解解剖形状如何响应发育协变量而演变——并量化其空间变化的不确定性——在医疗保健研究中至关重要。现有方法通常依赖于忽略空间异质性动态的全局时间扭曲公式。我们引入PRISM，一种新颖的框架，将隐式神经表示与不确定性感知统计形状分析相结合。PRISM建模给定协变量下形状的条件分布，提供总体均值和协变量依赖不确定性在任意位置的空间连续估计。一个关键的理论贡献是封闭形式的Fisher信息度量，通过自动微分实现高效、解析可处理的局部时间不确定性量化。在三个合成数据集和一个临床数据集上的实验表明，PRISM在统一框架内从建模形状演化到个性化形状预测和异常检测等多样化任务中表现出色，同时提供可解释且临床有意义的不确定性估计。

英文摘要

Understanding how anatomical shapes evolve in response to developmental covariates - and quantifying their spatially varying uncertainties - is critical in healthcare research. Existing approaches typically rely on global time-warping formulations that ignore spatially heterogeneous dynamics. We introduce PRISM, a novel framework that bridges implicit neural representations with uncertainty-aware statistical shape analysis. PRISM models the conditional distribution of shapes given covariates, providing spatially continuous estimates of both the population mean and covariate-dependent uncertainty at arbitrary locations. A key theoretical contribution is a closed-form Fisher Information metric that enables efficient, analytically tractable local temporal uncertainty quantification via automatic differentiation. Experiments on three synthetic datasets and one clinical dataset demonstrate PRISM's strong performance across diverse tasks - from modeling shape evolution to personalized shape prediction and anomaly detection - within a unified framework, while providing interpretable and clinically meaningful uncertainty estimates.

URL PDF HTML ☆

赞 0 踩 0

2603.10718 2026-06-18 cs.LG 版本更新

Riemannian MeanFlow for One-Step Generation on Manifolds

Riemannian MeanFlow用于流形上的单步生成

Zichen Zhong, Haoliang Sun, Yukun Zhao, Yongshun Gong, Yilong Yin

发表机构 * School of Software, Shandong University, Jinan, China（软件学院，山东大学，济南，中国）

AI总结本文提出Riemannian MeanFlow（RMF），通过平行运输定义平均速度场，并推导出将平均速度与瞬时速度联系起来的Riemannian MeanFlow恒等式，从而实现流形上基于位置的切空间中的单步生成，改进了生成质量与效率的权衡并降低了采样成本。

Comments ICML 2026

详情

AI中文摘要

Flow Matching enables simulation-free training of generative models on Riemannian manifolds, yet sampling typically still relies on numerically integrating a probability-flow ODE. We propose Riemannian MeanFlow (RMF), extending MeanFlow to manifold-valued generation where velocities lie in location-dependent tangent spaces. RMF defines an average-velocity field via parallel transport and derives a Riemannian MeanFlow identity that links average and instantaneous velocities for intrinsic supervision. We make this identity practical in a log-map tangent representation, avoiding trajectory simulation and heavy geometric computations. For stable optimization, we decompose the RMF objective into two terms and apply conflict-aware multi-task learning to mitigate gradient interference. RMF also supports conditional generation via classifier-free guidance. Experiments on spheres, tori, SO(3), and SE(3) demonstrate competitive one-step sampling with improved quality-efficiency trade-offs and substantially reduced sampling cost.

英文摘要

Flow Matching enables simulation-free training of generative models on Riemannian manifolds, yet sampling typically still relies on numerically integrating a probability-flow ODE. We propose Riemannian MeanFlow (RMF), extending MeanFlow to manifold-valued generation where velocities lie in location-dependent tangent spaces. RMF defines an average-velocity field via parallel transport and derives a Riemannian MeanFlow identity that links average and instantaneous velocities for intrinsic supervision. We make this identity practical in a log-map tangent representation, avoiding trajectory simulation and heavy geometric computations. For stable optimization, we decompose the RMF objective into two terms and apply conflict-aware multi-task learning to mitigate gradient interference. RMF also supports conditional generation via classifier-free guidance. Experiments on spheres, tori, SO(3), and SE(3) demonstrate competitive one-step sampling with improved quality-efficiency trade-offs and substantially reduced sampling cost.

URL PDF HTML ☆

赞 0 踩 0

2604.04342 2026-06-18 cs.LG stat.ML 版本更新

Generative models for decision-making under distributional shift

分布偏移下决策的生成模型

Xiuyuan Cheng, Yunqin Zhu, Yao Xie

发表机构 * Department of Mathematics, Duke University（杜克大学数学系）； H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology（佐治亚理工学院H. Milton Stewart工业与系统工程学院）

AI总结本文提出基于流和分数生成模型的统一框架，通过传输映射、速度场等工具处理分布偏移下的决策问题，实现鲁棒性、条件分布生成及不确定性量化。

Comments INFORMS TutORials in Operations Research, 2026

详情

AI中文摘要

许多数据驱动的决策问题使用从历史数据估计的名义分布来制定，而性能最终由可能发生偏移、依赖于上下文、部分观测或由压力引起的部署分布决定。本教程介绍了现代生成模型，特别是基于流和分数的方法，作为构建决策相关分布的数学工具。从运筹学的角度来看，它们的主要价值不在于无约束的样本合成，而在于通过传输映射、速度场、分数场和引导随机动力学来表示和变换分布。我们提出了一个基于前推映射、连续性、Fokker-Planck方程、Wasserstein几何和概率空间优化的统一框架。在此框架内，生成模型可用于学习名义不确定性、构建用于鲁棒性的受压或最不利分布，以及在侧信息和部分观测下生成条件或后验分布。我们还强调了代表性的理论保证，包括迭代流模型的前向-反向收敛、传输映射空间中的一阶极小极大分析，以及具有生成先验的后验采样的误差传递界。本教程为在分布偏移下使用生成模型进行场景生成、鲁棒决策、不确定性量化及相关问题提供了原则性的介绍。

英文摘要

Many data-driven decision problems are formulated using a nominal distribution estimated from historical data, while performance is ultimately determined by a deployment distribution that may be shifted, context-dependent, partially observed, or stress-induced. This tutorial presents modern generative models, particularly flow- and score-based methods, as mathematical tools for constructing decision-relevant distributions. From an operations research perspective, their primary value lies not in unconstrained sample synthesis but in representing and transforming distributions through transport maps, velocity fields, score fields, and guided stochastic dynamics. We present a unified framework based on pushforward maps, continuity, Fokker-Planck equations, Wasserstein geometry, and optimization in probability space. Within this framework, generative models can be used to learn nominal uncertainty, construct stressed or least-favorable distributions for robustness, and produce conditional or posterior distributions under side information and partial observation. We also highlight representative theoretical guarantees, including forward-reverse convergence for iterative flow models, first-order minimax analysis in transport-map space, and error-transfer bounds for posterior sampling with generative priors. The tutorial provides a principled introduction to using generative models for scenario generation, robust decision-making, uncertainty quantification, and related problems under distributional shift.

URL PDF HTML ☆

赞 0 踩 0

2605.17232 2026-06-18 cs.LG math.ST stat.ML stat.TH 版本更新

Dimension-Free Convergence of Discrete Diffusion Models: Adjoint Equations Induce the Right Space

离散扩散模型的维度无关收敛性：伴随方程诱导了正确的空间

Kelvin Kan, Xingjian Li, Benjamin J. Zhang, Tuhin Sahai, Stanley Osher, Markos A. Katsoulakis

发表机构 * Department of Mathematics（数学系）； Oden Institute School of Data Science and Society（数据科学与社会学院）； UCLA（加州大学洛杉矶分校）； University of Texas at Austin（德克萨斯大学奥斯汀分校）； UNC Chapel Hill（北卡罗来纳大学教堂山分校）； Computational and Applied Sciences Group（计算与应用科学组）； Department of Mathematics and Statistics（数学与统计学系）； SRI International（SRI国际）； University of Massachusetts Amherst（马萨诸塞大学阿姆赫斯特分校）

AI总结本文提出了一种基于伴随方程的统一框架，实现了任何积分概率度量（IPM）下的维度无关收敛保证，克服了传统KL和TV方法在处理大规模状态空间时的局限性。

详情

AI中文摘要

离散扩散已成为生成建模中的领先框架，广泛应用于语言、视觉和生物学等领域。然而，现有的收敛理论存在根本性局限。基于KL的分析在奇异先验如掩码分布下会发散，而总变差（TV）的界依赖于状态空间大小S，并在现代语言任务中变得无效，因为词汇表包含数以万计的标记。我们开发了一种统一的基于伴随方程的框架，建立了任何积分概率度量（IPM）下的维度无关收敛保证。到目前为止，我们的界是首个完全不依赖S且适用于掩码和均匀先验的。重要的是，我们的理论仅依赖于一个标准的速率矩阵正则性假设，并且兼容时间非齐次调度。四个新颖的技术推动了我们的改进：通过伴随方程在可观测空间中工作而不是直接处理概率测度，一种产生任何IPM界正则性分析，一种耦合论证在均匀转移下去除S依赖性，以及一种分数-边际抵消技术在掩码转移下去除S依赖性。因此，我们的框架与先前分析显著不同，并避免了路径空间-KL和现有TV方法的不足。除了收敛界外，我们的框架还提供了一种灵活的工具包，用于进一步理论研究离散扩散模型。

英文摘要

Discrete diffusion has become a leading framework for generative modeling in various applications including language, vision, and biology. Existing convergence theory, however, exhibits fundamental limitations. KL-based analyses diverge under singular priors such as the masked distribution, while bounds in total variation (TV) depend on the state space size $S$ and become vacuous for modern language tasks, where vocabularies contain hundreds of thousands of tokens. We develop a unified adjoint-equation-based framework that establishes dimension-free convergence guarantees in any integral probability metric (IPM). To the best of our knowledge, our bounds are the first to be entirely free of $S$ and applicable to both masked and uniform priors. Importantly, our theory relies only on a single standard rate-matrix regularity assumption and applies to general priors. Five novel techniques drive our improvements: working in the space of observables via adjoint equations rather than directly with probability measures, a regularity analysis that yields bounds on any IPM, a coupling argument that removes $S$-dependence under uniform transitions, and score-marginal cancellation and exit-routing techniques that remove $S$-dependence under masked transitions. Our framework thus sharply departs from prior analyses and avoids the shortcomings of pathspace-KL and existing TV-based approaches. Beyond convergence bounds, our framework provides a versatile toolkit for further theoretical study of discrete diffusion models, including principled choices of loss functions and dimension-free step complexity.

URL PDF HTML ☆

赞 0 踩 0

2605.30920 2026-06-18 cs.LG 版本更新

Unsupervised Diffusion Solver for Combinatorial Optimization via Combinatorial Adjoint Matching

通过组合伴随匹配实现组合优化的无监督扩散求解器

Shengyu Feng, Tarun Suresh, Yiming Yang

发表机构 * Language Technologies Institute, Carnegie Mellon University（卡内基梅隆大学语言技术研究所）； University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）

AI总结提出组合伴随匹配（CAM）框架，利用离散伴随动力学和随机控制公式，实现无监督训练离散扩散求解器，在多种组合优化问题上达到与监督方法竞争的性能。

Comments ICML26

详情

AI中文摘要

基于扩散的神经求解器在组合优化（CO）中显示出强大潜力，但现有方法通常依赖于使用大量近最优解进行监督训练。在这项工作中，我们将基于伴随的轨迹优化方法扩展到离散组合域。我们将基于扩散的CO表述为连续时间马尔可夫链上的随机控制问题，并引入离散伴随动力学，用于通过离散生成轨迹传播优化信号。基于这一表述，我们提出了组合伴随匹配（CAM），一种用于离散扩散求解器的无监督训练框架，具有结构化和低方差的轨迹级优化信号。实验上，CAM在多种组合优化问题上始终优于现有的无监督扩散基线，并与强大的监督扩散求解器甚至传统求解器性能相当。我们的代码可在 https://github.com/Shengyu-Feng/CAM 获取。

英文摘要

Diffusion-based neural solvers have shown strong promise for combinatorial optimization (CO), but existing methods typically rely on supervised training with large collections of near-optimal solutions. In this work, we extend adjoint-based trajectory optimization methods to discrete combinatorial domains. We formulate diffusion-based CO as a stochastic control problem over Continuous-Time Markov Chains and introduce discrete adjoint dynamics for propagating optimization signals through discrete generative trajectories. Building on this formulation, we propose Combinatorial Adjoint Matching (CAM), an unsupervised training framework for discrete diffusion solvers with structured and low-variance trajectory-level optimization signals. Empirically, CAM consistently outperforms existing unsupervised diffusion baselines and achieves performance competitive with strong supervised diffusion solvers and even traditional solvers across diverse combinatorial optimization problems. Our code is available at https://github.com/Shengyu-Feng/CAM.

URL PDF HTML ☆

赞 0 踩 0

2606.10466 2026-06-18 cs.LG cs.AI 版本更新

UPLOTS: A Unified Pretrained Language Model for Constrained Time-series Generation

UPLOTS: 一种用于约束时间序列生成的统一预训练语言模型

Du Yin, Hao Xue, Jinliang Deng, Yang Yang, Shuang Ao, Arian Prabowo, Flora Salim

发表机构 * University of New South Wales（新南威尔士大学）； HKUST(GZ)（香港科技大学（广州））； BUAA（北京航空航天大学）

AI总结提出UPLOTS，一种基于统一预训练语言模型和提示引导的框架，通过动态多数据集损失重加权和提示到模式映射，实现跨领域约束时间序列生成，在四个基准上验证了其泛化性和数据增强效果。

详情

AI中文摘要

三角参考薛定谔桥用于时间序列生成

Gabriele Bocchi

发表机构 * Arakne S.r.l.（阿拉克内公司）

AI总结提出三角参考薛定谔桥框架，通过区间冻结的退化扩散参考和层次化潜在波动率结构，实现时间序列的保守生成，并保持熵最小化的变分核心。

详情

AI中文摘要

我们引入了用于时间序列的三角参考薛定谔桥（TR-SBTS），这是SBTS框架的一种保守扩展，其中布朗参考被替换为区间冻结的、可能退化的扩散参考，在潜在波动率水平的层次上呈三角形。该构造是在增广状态空间上的单一熵投影，变分约束在时间和潜在水平上联合施加，并通过相对熵的分解层次展开。SBTS的变分核心得以保留：熵最小化器是参考的h-变换，在每个冻结区间上，最优动力学在活跃协方差方向的仿射叶上具有对数梯度漂移公式，即使冻结协方差是秩亏的也成立。我们建立了冻结近似的稳定性以及相应正则化核估计量的收敛性。该构造通过一个有限维条件映射实现，该映射由三种互补的过去约简组成——块PCR摘要、由运行时冻结协方差累积量诱导的过去增量的参考感知马氏核，以及在同一参考度量下的过去窗口WLS漂移回归器——以及一个耦合的状态-协方差桥步骤，其中每个潜在水平为上一水平产生动态参考，并由协方差描述符总结；该构造在数值实验上进行了评估。

英文摘要

Schrödinger bridges for time series (SBTS) generate synthetic paths by projecting, in relative entropy, a Brownian reference onto the path laws that match the joint distribution of the data on the observation grid. The Brownian reference, however, fixes the quadratic variation of the generated paths, which is restrictive when stochastic volatility, correlated noise, or rank-deficient covariance structures must be reproduced. We introduce "Triangular-Reference Schrödinger Bridges for Time Series" (TR-SBTS), which keeps the entropy-projection backbone of SBTS but replaces the Brownian reference by a triangular, volatility-informed, intervalwise frozen reference on a state augmented with latent covariance descriptors. The construction remains a single entropy projection on the augmented state: the minimiser is the $h$-transform of the reference, and on each frozen interval the optimal drift has the logarithmic-gradient form $b^\star(t,x)=A\,\nabla\log H(t,x)$, intrinsic to the active covariance directions when the frozen covariance $A$ is degenerate. We prove stability of the frozen approximation and consistency of the associated regularised kernel estimators, describe a reference-aware Nadaraya--Watson implementation of the conditional next-increment law, and evaluate the construction on numerical experiments.

URL PDF HTML ☆

赞 0 踩 0

2605.28690 2026-06-18 quant-ph cs.LG 版本更新

Latent-Conditioned Parameterized Quantum Circuits as Universal Approximators for Distributions over Quantum States

潜在条件参数化量子电路作为量子态分布的通用近似器

Quoc Hoan Tran, Koki Chinzei, Yasuhiro Endo, Hirotaka Oshima

发表机构 * Quantum Laboratory, Fujitsu Research, Fujitsu Limited（Fujitsu 研究所量子实验室， Fujitsu 有限公司）

AI总结提出潜在条件参数化量子电路（LPQC），通过经典神经网络将潜在变量映射到量子电路参数，证明其在1-Wasserstein距离下是密度算子概率测度的通用近似器，并引入多模态潜在先验和专家混合电路架构缓解贫瘠高原问题。

Comments 21 pages, 11 figures (fix the proof and update appendix for barren plateaus analysis)

详情

AI中文摘要

量子模拟、量子化学和量子机器学习中的许多应用不仅需要单个量子态，还需要表征目标系统异质性的量子态系综。在变分和容错设置中，逐个状态地准备这样的系综是不可行的，这激发了生成式建模方法。我们引入了潜在条件参数化量子电路（LPQC），这是一种混合量子-经典框架，其中经典神经网络将从先验分布中采样的潜在变量映射到参数化量子电路的参数。我们证明了LPQC在1-Wasserstein距离下是密度算子概率测度的通用近似器，将经典通用近似定理扩展到量子分布设置。我们还引入了多模态潜在先验和专家混合电路架构，并表明它在优化过程中经验性地缓解了贫瘠高原问题。数值实验在合成多簇混合量子态系综和QM9衍生的3D分子结构系综上验证了该框架。在这些任务中，LPQC优于最近的量子生成基线，同时与典型的经典基线相比，在输出维度大幅降低的情况下保持竞争力。通过利用潜在空间中的经典表达能力，LPQC为量子生成建模提供了一条可行的途径。

英文摘要

Many applications in quantum simulation, quantum chemistry, and quantum machine learning require not a single quantum state but an ensemble of states characterizing the heterogeneity of a target system. Preparing such ensembles state-by-state is prohibitive in both variational and fault-tolerant settings, thereby motivating a generative modeling approach. We introduce latent-conditioned parameterized quantum circuits (LPQCs), a hybrid quantum-classical framework in which classical neural networks map a latent variable sampled from a prior distribution to the parameters of a parameterized quantum circuit. We prove that LPQCs are universal approximators for probability measures over density operators in the 1-Wasserstein distance, extending classical universal approximation theorems to the quantum-distribution setting. We additionally introduce a multimodal latent prior and a mixture-of-experts circuit architecture, and show empirically that the latent-conditioned parameterization alleviates the barren plateau problem during optimization, a behavior for which we provide rigorous partial guarantees. Numerical experiments validate the framework on a synthetic multi-cluster ensemble of mixed quantum states and on a QM9-derived ensemble of 3-D molecular structures. In these tasks, LPQC outperforms recent quantum generative baselines and matches the generation quality of a classical neural-network baseline, while requiring an output dimension that grows only linearly with the number of qubits rather than exponentially. By leveraging classical expressivity in the latent space, LPQCs offer a tractable route to quantum generative modeling.

URL PDF HTML ☆

赞 0 踩 0

2606.17491 2026-06-18 stat.ML cs.LG stat.ME 版本更新

A Bayesian Boolean Matrix Factorization with Application to Copy Number Analysis in Cancer

贝叶斯布尔矩阵分解及其在癌症拷贝数分析中的应用

Adolphus Wagala, Mehmet Samur, Giovanni Parmigiani

发表机构 * Department of Data Science, Dana-Farber Cancer Institute（数据科学部，达纳-法伯癌症研究所）； Department of Biostatistics, Harvard T.H. Chan School of Public Health（生物统计学部，哈佛T.H. 潘克学校公共卫生学院）

AI总结提出贝叶斯布尔矩阵分解（BBMF）模型，通过全共轭生成模型和稀疏先验实现布尔约束下的可解释因子分解，并应用于多发性骨髓瘤的染色体臂拷贝数变异分析，揭示肿瘤异质性的离散潜在结构。

详情

AI中文摘要

二值数据分解很常见，但实值方法忽略了离散性并产生难以解释的因子。布尔矩阵分解（BooMF）通过逻辑与和或运算将二值矩阵分解为两个低秩二值矩阵，将数据表示为可解释模式的布尔析取。在癌症基因组学中，BooMF可以揭示可能驱动肿瘤演化的协调特征变化，这与旋转或加性分解不同。大多数现有的BooMF方法是启发式的、贪婪的、对初始化敏感、容易陷入局部最优，并且不支持原则性的模型选择或不确定性量化。我们引入了贝叶斯布尔矩阵分解（BBMF），这是一个具有稀疏诱导先验的全共轭生成模型。它强制执行布尔约束，产生具有一致不确定性量化的可解释潜在因子，并允许具有封闭形式全条件分布的吉布斯采样。由于癌症演化通常涉及广泛、近乎同时的染色体数目变化（例如，全基因组复制后伴随不稳定性和选择），布尔分解比加性模型更自然地捕捉这些模式。应用于多发性骨髓瘤的臂级拷贝数变异数据（其中条目指示染色体臂扩增的存在/缺失），BBMF找到了一小组可解释的双团，将患者子集与反复共变的染色体臂联系起来，提供了肿瘤异质性的紧凑、生物学上有意义的总结，并展示了BBMF在复杂二值数据中发现离散潜在结构的实用性。

英文摘要

Binary data factorization is common, but real-valued methods ignore discreteness and yield hard-to-interpret factors. Boolean Matrix Factorization (BooMF) instead decomposes a binary matrix into two lower-rank binary matrices via logical AND and OR, expressing the data as a Boolean disjunction of interpretable patterns. In cancer genomics, BooMF can reveal coordinated feature changes that may drive tumor evolution, unlike rotational or additive decompositions. Most existing BooMF methods are heuristic, greedy, sensitive to initialization, prone to local optima, and do not support principled model selection or uncertainty quantification. We introduce Bayesian Boolean Matrix Factorization (BBMF), a fully conjugate generative model with sparsity-inducing priors. It enforces Boolean constraints, yields interpretable latent factors with coherent uncertainty quantification, and admits Gibbs sampling with closed-form full conditionals. Because cancer evolution often involves widespread, near-simultaneous chromosome-number changes (e.g., whole-genome duplication followed by instability and selection), Boolean factorizations capture these patterns more naturally than additive models. Applied to arm-level copy-number alteration data in multiple myeloma, where entries indicate presence/absence of chromosomal-arm amplifications, BBMF finds a small set of interpretable bicliques linking patient subsets to recurrently co-altered chromosomal arms, providing a compact, biologically meaningful summary of tumor heterogeneity and demonstrating BBMF's utility for uncovering discrete latent structure in complex binary data.

URL PDF HTML ☆

赞 0 踩 0

2602.11557 2026-06-18 cs.LG stat.ML 版本更新

The Implicit Bias of Steepest Descent with Mini-batch Stochastic Gradient

小批量随机梯度下降的隐式偏差

Jichu Li, Xuan Tang, Difan Zou

AI总结研究小批量随机最陡下降在多类分类中的隐式偏差，揭示批大小、动量和方差缩减对最大间隔行为和收敛率的影响，并证明动量可实现小批量收敛，方差缩减可恢复全批量隐式偏差。

详情

AI中文摘要

多种广泛使用的优化方法，如SignSGD和Muon，可以被解释为在不同范数诱导几何下的最陡下降实例。在这项工作中，我们研究了多类分类中小批量随机最陡下降的隐式偏差，刻画了批大小、动量和方差缩减如何在一般逐项和Schatten-$p$范数下塑造极限最大间隔行为和收敛率。我们证明，在没有动量时，最坏情况下的收敛和成功分类只能通过全批量梯度保证。相反，动量通过批量-动量权衡使得小批量收敛到近似最大间隔解成为可能，尽管会减慢收敛速度。该方法提供了完全显式、与维度无关的收敛率，优于先前的结果。此外，我们证明方差缩减可以恢复任意批大小下的精确全批量隐式偏差，尽管收敛速度较慢。最后，我们进一步研究了无动量的单批量最陡下降，并通过一个具体数据示例揭示了其收敛到根本不同偏差的特性，这揭示了纯随机更新的一个关键局限性。总体而言，我们的统一分析阐明了随机优化何时与全批量行为一致，并为更深入地探索随机梯度最陡下降算法的训练行为铺平了道路。

英文摘要

A variety of widely used optimization methods like SignSGD and Muon can be interpreted as instances of steepest descent under different norm-induced geometries. In this work, we study the implicit bias of mini-batch stochastic steepest descent in multi-class classification, characterizing how batch size, momentum, and variance reduction shape the limiting max-margin behavior and convergence rates under general entry-wise and Schatten-$p$ norms. We show that, without momentum, worst-case convergence and successful classification can only be guaranteed with full-batch gradient. In contrast, momentum enables small-batch convergence to an approximate max-margin solution through a batch-momentum trade-off, though it slows convergence. This approach provides fully explicit, dimension-free rates that improve upon prior results. Moreover, we prove that variance reduction can recover the exact full-batch implicit bias for any batch size, albeit at a slower convergence rate. Finally, we further investigate the batch-size-one steepest descent without momentum, and reveal its convergence to a fundamentally different bias via a concrete data example, which reveals a key limitation of purely stochastic updates. Overall, our unified analysis clarifies when stochastic optimization aligns with full-batch behavior, and paves the way for perform deeper explorations of the training behavior of stochastic gradient steepest descent algorithms.

URL PDF HTML ☆

赞 0 踩 0

2411.16206 2026-06-18 cs.LG cs.AI cs.NE 版本更新

Scalable Batch Bayesian Optimization Via Subspace Acquisition Functions

可扩展的批量贝叶斯优化：基于子空间采集函数

Dawei Zhan, Zhaoxi Zeng, Shuoxiao Wei, Ping Wu

发表机构 * School of Computing and Artificial Intelligence（计算与人工智能学院）

AI总结提出通过从原始问题的轴对齐子空间中各选一点来扩展贝叶斯优化至大规模批量评估，显著加速收敛，与十种批量算法相比极具竞争力。

详情

DOI: 10.1145/3820495
Journal ref: ACM Transactions on Evolutionary Learning and Optimization, 2026

AI中文摘要

将贝叶斯优化扩展到批量评估可以使设计者充分利用并行计算技术。然而，当前大多数批量方法在批量大小增大时扩展性不佳，优化效率往往下降。为解决此问题，本文提出一种简单高效的方法，将贝叶斯优化扩展到大规模批量评估。与现有批量方法不同，新方法的思想是从原始问题中抽取一批轴对齐子空间，并使用现有采集函数从每个子空间中选择一个点。数值实验表明，与顺序贝叶斯优化算法相比，我们提出的方法显著加速收敛，并且与十种批量贝叶斯优化算法相比表现非常有竞争力。我们提出的方法的实现可在此 https URL 获取。

英文摘要

Extending Bayesian optimization to batch evaluation can enable the designer to make the most use of parallel computing technology. However, most of current batch approaches do not scale well with the batch size. That is, their optimization efficiencies often deteriorate as the batch size increases. To address this issue, we propose a simple and efficient approach to extend Bayesian optimization to large-scale batch evaluation in this work. Different from existing batch approaches, the idea of the new approach is to draw a batch of axis-aligned subspaces of the original problem and select one point from each subspace using existing acquisition functions. Numerical experiments show that our proposed approach speedups the convergence significantly when compared with the sequential Bayesian optimization algorithm, and performs very competitively when compared with ten batch Bayesian optimization algorithms. The implementation of our proposed approach is available at https://github.com/zhandawei/SubSpace_Acquisition_Functions.

URL PDF HTML ☆

赞 0 踩 0

2506.08764 2026-06-18 cs.LG 版本更新

On the Stability of the Jacobian Matrix in Deep Neural Networks

深度神经网络中雅可比矩阵的稳定性

Benjamin Dadoun, Soufiane Hayou, Hanan Salam, Mohamed El Amine Seddik, Pierre Youssef

AI总结本文利用随机矩阵理论，建立了深度神经网络中雅可比矩阵谱稳定性的通用定理，适用于稀疏和非独立同分布权重，扩展了初始化方案的理论基础。

Comments 21 pages, 28 figures; the main theorem was wrong (again) and is now corrected

2509.14969 2026-06-18 cs.LG math.OC stat.ML 版本更新

Stochastic Adaptive Gradient Descent Without Descent

无需下降的随机自适应梯度下降

Jean-François Aujol, Jérémie Bigot, Camille Castera

发表机构 * Univ. Bordeaux CNRS, Bordeaux INP, IMB, UMR 5251（波尔多大学 CNRS，波尔多 INP，IMB，UMR 5251）

AI总结提出一种无需超参数调优的随机梯度自适应步长策略，利用一阶随机Oracle的局部几何信息，理论证明收敛性，实验与调优基线竞争。

2602.14789 2026-06-18 cs.LG stat.ML 版本更新

On the Stability of Nonlinear Dynamics in GD and SGD: Beyond Quadratic Potentials

关于GD和SGD中非线性动力学的稳定性：超越二次势能

Rotem Mulayoff, Sebastian U. Stich

发表机构 * CISPA Helmholtz Center for Information Security（CISPA赫尔姆霍兹信息安全中心）

AI总结研究梯度下降和随机梯度下降中非线性项对动力学稳定性的影响，推导了多元设置下稳定振荡的精确条件，并发现SGD的稳定性由单个不稳定批次决定。

Comments Accepted to COLT 2026

详情

AI中文摘要

训练过程中迭代的动力稳定性在确定优化算法所获得的极小值方面起着关键作用。例如，梯度下降（GD）的稳定解对应于平坦极小值，而平坦极小值被认为具有有利特征。虽然先前的工作通常依赖线性化来确定稳定性，但线性化动力学是否忠实捕捉完整的非线性行为仍不清楚。最近的研究表明，GD可能在线性不稳定的极小值附近稳定振荡，并在步长衰减后收敛，这表明线性分析可能具有误导性。在这项工作中，我们明确研究了非线性项的影响。具体而言，我们在多元设置下推导了GD在极小值附近稳定振荡的精确准则。我们的条件依赖于高阶导数，推广了现有结果。将分析扩展到随机梯度下降（SGD），我们表明即使单个批次不稳定，非线性动力学也可能在期望上发散。这意味着稳定性可能由单个不稳定振荡的批次决定，而非线性分析所暗示的平均效应。最后，我们证明如果所有批次都是线性稳定的，则SGD的非线性动力学在期望上是稳定的。

英文摘要

The dynamical stability of the iterates during training plays a key role in determining the minima obtained by optimization algorithms. For example, stable solutions of gradient descent (GD) correspond to flat minima, which have been associated with favorable features. While prior work often relies on linearization to determine stability, it remains unclear whether linearized dynamics faithfully capture the full nonlinear behavior. Recent work has shown that GD may stably oscillate near a linearly unstable minimum and still converge once the step size decays, indicating that linear analysis can be misleading. In this work, we explicitly study the effect of nonlinear terms. Specifically, we derive an exact criterion for stable oscillations of GD near minima in the multivariate setting. Our condition depends on high-order derivatives, generalizing existing results. Extending the analysis to stochastic gradient descent (SGD), we show that nonlinear dynamics can diverge in expectation even if a single batch is unstable. This implies that stability can be dictated by a single batch that oscillates unstably, rather than an average effect, as linear analysis suggests. Finally, we prove that if all batches are linearly stable, the nonlinear dynamics of SGD are stable in expectation.

URL PDF HTML ☆

赞 0 踩 0

2605.04267 2026-06-18 cs.LG cs.NE math.OC 版本更新

QUIVER: Cost-Aware Adaptive Preference Querying in Surrogate-Assisted Evolutionary Multi-Objective Optimization

QUIVER: 代理辅助多目标进化优化中的成本自适应偏好查询

Florian A. D. Burnat

发表机构 * University of Warwick（沃里克大学）； Warwick Business School（沃里克商学院）

AI总结提出QUIVER方法，通过自适应选择目标评估与异质偏好查询（成对偏好陈述与无差异调整），在代理辅助多目标优化中最小化决策遗憾，实验显示在WFG难题上效用遗憾降低25%。

Comments Accepted at Genetic and Evolutionary Computation Conference (GECCO '26)

详情

DOI: 10.1145/3795095.3805174

AI中文摘要

交互式多目标优化系统面临预算分配困境：资源可用于昂贵的目标评估，或用于引出决策者偏好以识别帕累托集的相关区域。此外，偏好引出本身跨越具有不同信息内容和认知负担的模态，从廉价、嘈杂的成对偏好陈述（PS）到更丰富但成本更高的无差异调整（IA）。我们研究了未知标量化下的成本感知优化，并引入了QUIVER（查询信息价值估计遗憾），这是一种代理辅助的进化多目标优化器，可自适应地在目标评估和异质偏好查询之间进行选择。在每一步，QUIVER通过最大化每单位总成本的预期决策质量改进来选择下一个动作。在合成决策者模型下的DTLZ和WFG基准测试中，QUIVER在具有挑战性的WFG问题上实现了最低的最终效用遗憾（WFG4上效用遗憾为2.14，WFG9上为2.82：比基线提高25%），优于所有单模态基线。我们分析了PS和IA的最优混合如何适应问题难度：在简单问题（DTLZ2）上，QUIVER选择80%的PS查询；在困难问题（WFG9）上，它转向35%的IA查询。这种自适应模态选择展示了成本感知偏好学习的实际应用。

英文摘要

Interactive multi-objective optimization systems face a budget allocation dilemma: one can spend resources on expensive objective evaluations or on eliciting decision-maker preferences that identify the relevant region of the Pareto set. Moreover, preference elicitation itself spans modalities with different information content and cognitive burden, ranging from cheap, noisy pairwise preference statements (PS) to richer but costlier indifference adjustments (IA). We study cost-aware optimization under an unknown scalarization and introduce QUIVER (Query-Informed Value Estimation for Regret), a surrogate-assisted evolutionary multi-objective optimizer that adaptively chooses between objective evaluations and heterogeneous preference queries. At each step, QUIVER selects the next action by maximizing the expected decision-quality improvement per unit total cost. Across DTLZ and WFG benchmarks under synthetic decision-maker models, QUIVER achieves the lowest final utility regret on challenging WFG problems (utility regret of 2.14 on WFG4, 2.82 on WFG9: a 25% improvement over baselines), outperforming all single-modality baselines. We analyze how the optimal mix of PS and IA adapts to problem difficulty: on easy problems (DTLZ2), QUIVER selects 80\% PS queries; on hard problems (WFG9), it shifts to 35% IA queries. This adaptive modality selection demonstrates cost-aware preference learning in action.

URL PDF HTML ☆

赞 0 踩 0

2505.15215 2026-06-18 stat.ML cs.LG stat.ME 版本更新

面向天气基础模型的任务自适应参数高效微调

Shilei Cao, Hehai Lin, Jiashun Cheng, Yang Liu, Guowen Li, Xuehe Wang, Juepeng Zheng, Haoyuan Liang, Meng Jin, Chengwei Qin, Hong Cheng, Haohuan Fu

发表机构 * Sun Yat-sen University（中山大学）； The Hong Kong University of Science and Technology (Guangzhou)（香港科技大学（广州））； The Hong Kong University of Science and Technology（香港科技大学）； The Chinese University of Hong Kong（香港中文大学）； National Supercomputing Center in Shenzhen（深圳国家超算中心）； Huawei Technologies Co., Ltd（华为技术有限公司）； Tsinghua University（清华大学）

AI总结提出WeatherPEFT框架，通过任务自适应动态提示和随机Fisher引导自适应选择，在天气下游任务上以更少参数达到全微调性能。

详情

AI中文摘要

尽管机器学习的最新进展使天气基础模型（WFM）在多种下游任务中具备了强大的泛化能力，但随着模型规模扩大，计算需求不断攀升，实际部署愈发困难。当前为视觉或语言任务设计的参数高效微调（PEFT）方法无法应对天气下游任务的独特挑战，如变量异质性、分辨率多样性和时空覆盖变化，导致在WFM上性能欠佳。为弥补这一差距，我们提出WeatherPEFT，一种新颖的PEFT框架，包含两项协同创新。首先，在前向传播中，任务自适应动态提示（TADP）通过内部和外部模式提取，将编码器中的嵌入权重动态注入预训练骨干网络的输入令牌，实现针对特定下游任务的上下文感知特征重校准。其次，在反向传播中，随机Fisher引导自适应选择（SFAS）不仅利用Fisher信息识别并更新最关键的任务参数，从而保留不变的预训练知识，还引入随机性以稳定选择过程。我们在三个下游任务上验证了WeatherPEFT的有效性和效率，现有PEFT方法与全微调相比存在显著差距，而WeatherPEFT使用更少的可训练参数达到了与全微调相当的性能。本工作代码见此https链接。

英文摘要

While recent advances in machine learning have equipped Weather Foundation Models (WFMs) with substantial generalization capabilities across diverse downstream tasks, the escalating computational requirements associated with their expanding scale increasingly hinder practical deployment. Current Parameter-Efficient Fine-Tuning (PEFT) methods, designed for vision or language tasks, fail to address the unique challenges of weather downstream tasks, such as variable heterogeneity, resolution diversity, and spatiotemporal coverage variations, leading to suboptimal performance when applied to WFMs. To bridge this gap, we introduce WeatherPEFT, a novel PEFT framework for WFMs incorporating two synergistic innovations. First, during the forward pass, Task-Adaptive Dynamic Prompting (TADP) dynamically injects the embedding weights within the encoder to the input tokens of the pre-trained backbone via internal and external pattern extraction, enabling context-aware feature recalibration for specific downstream tasks. Furthermore, during backpropagation, Stochastic Fisher-Guided Adaptive Selection (SFAS) not only leverages Fisher information to identify and update the most task-critical parameters, thereby preserving invariant pre-trained knowledge, but also introduces randomness to stabilize the selection. We demonstrate the effectiveness and efficiency of WeatherPEFT on three downstream tasks, where existing PEFT methods show significant gaps versus Full-Tuning, and WeatherPEFT achieves performance parity with Full-Tuning using fewer trainable parameters. The code of this work is available at https://github.com/ShileiCao/WeatherPEFT.

URL PDF HTML ☆

赞 0 踩 0

2601.21626 2026-06-18 cs.LG cs.AI 版本更新

HeRo-Q: A General Framework for Stable Low Bit Quantization via Hessian Conditioning

HeRo-Q: 通过Hessian条件化实现稳定低比特量化的通用框架

Jinhao Zhang, Yunquan Zhang, Zicheng yan, Boyang Zhang, Jun Sun, Daning Cheng

发表机构 * Beijing University of Posts and Telecommunications（北京邮电大学）； Institute of Computing Technology, Chinese Academy of Sciences（中国科学院计算技术研究所）； University of Science and Technology of China（中国科学技术大学）； Zhejiang Lab（浙江实验室）； Peng Cheng Laboratory（鹏城实验室）

AI总结针对后训练量化中“低误差、高损失”的矛盾，提出HeRo-Q算法，通过轻量可学习的旋转压缩矩阵重塑损失景观，降低最大Hessian特征值，增强对量化噪声的鲁棒性，在Llama和Qwen模型上优于现有方法。

详情

AI中文摘要

后训练量化（PTQ）是一种主流的模型压缩技术，但由于其仅专注于最小化量化误差，常常导致矛盾的“低误差、高损失”现象。根本原因在于LLM损失景观的Hessian矩阵：少数高曲率方向对扰动极其敏感。为了解决这个问题，我们提出了Hessian鲁棒量化（HeRo Q）算法，该算法在量化前对权重空间应用一个轻量级、可学习的旋转压缩矩阵。这个联合框架通过降低最大的Hessian特征值并减小其最大特征值来重塑损失景观，从而显著增强对量化噪声的鲁棒性。HeRo-Q不需要修改架构，计算开销可忽略不计，并且可以无缝集成到现有的PTQ流程中。在Llama和Qwen模型上的实验表明，HeRo Q在标准W4A8设置下不仅持续优于包括GPTQ、AWQ和SpinQuant在内的最先进方法，而且在极具挑战性的W3A16超低比特场景中表现出色，将Llama3 8B在GSM8K上的准确率提升至70.15%，并有效避免了激进量化中常见的逻辑崩溃。

英文摘要

Post Training Quantization (PTQ), a mainstream model compression technique, often leads to the paradoxical 'low error, high loss' phenomenon because it focuses solely on minimizing quantization error. The root cause lies in the Hessian matrix of the LLM loss landscape: a few high curvature directions are extremely sensitive to perturbations. To address this, we propose the Hessian Robust Quantization (HeRo Q) algorithm, which applies a lightweight, learnable rotation-compression matrix to the weight space prior to quantization. This joint framework reshapes the loss landscape by reducing the largest Hessian eigenvalue and reducing its max eigenvalue, thereby significantly enhancing robustness to quantization noise. HeRo-Q requires no architectural modifications, incurs negligible computational overhead, and integrates seamlessly into existing PTQ pipelines. Experiments on Llama and Qwen models show that HeRo Q consistently outperforms state of the art methods including GPTQ, AWQ, and SpinQuant not only achieving superior performance under standard W4A8 settings, but also excelling in the highly challenging W3A16 ultra low bit regime, where it boosts GSM8K accuracy on Llama3 8B to 70.15\% and effectively avoids the logical collapse commonly seen in aggressive quantization.

URL PDF HTML ☆

赞 0 踩 0

2602.00161 2026-06-18 cs.LG cs.AI cs.CL quant-ph 版本更新

LLM Compression by Block Removal with Constrained Binary Optimization

通过带约束二进制优化的块移除进行LLM压缩

David Jansen, Roman Rausch, Ali Hashemi, David Montero, Román Orús

发表机构 * Multiverse Computing（多维计算公司）； Donostia International Physics Center（多斯蒂亚国际物理中心）； Ikerbasque Foundation for Science（伊克尔巴斯克科学基金会）

AI总结提出将大语言模型块移除压缩问题建模为约束二进制优化，映射到Ising玻璃系统，实现高效排序和高质量非连续块移除，在50%压缩时MMLU提升近23个百分点，且计算高效、通用性强。

Comments 16 pages, 3 figures

详情

AI中文摘要

在本文中，我们将通过最优删除Transformer块（“块移除”）来压缩大语言模型（LLM）的问题，表述为一个约束二进制优化（CBO）问题，该问题可以映射到物理系统（Ising玻璃），其能量是下游模型性能的强代理。这种表述使得能够高效地对大量候选块移除配置进行排序，产生许多高质量、非平凡的解决方案，而不仅仅是移除连续区域。我们的方法在深度压缩场景中表现强劲，例如在Llama-3.3-70B-Instruct的50%压缩中，与其他最先进的块移除方法相比，我们在MMLU基准上取得了近23个百分点的提升。对于较轻的压缩，它在多个基准上与这些方法表现相当，适用于Llama-3.1-8B-Instruct、Qwen3-14B（重训练前后）以及Llama-3.3-70B-Instruct。该方法计算效率高，仅需在校准数据集上对少数活跃参数进行前向和反向传播。此外，我们证明，当无法精确求解CBO问题时，使用良好的启发式求解器可以在可忽略的运行时间内提供在下游任务上表现良好的解决方案。该方法可以轻松应用于任何架构。我们在最近的NVIDIA-Nemotron-3-Nano-30B-A3B-FP8模型上展示了这种通用性，该模型具有高度不均匀且具有挑战性的块结构，并且在移除2个注意力层或3个混合专家层时，我们在AIME25和GPQA上超越了最先进水平。

英文摘要

In this paper, we formulate the compression of large language models (LLMs) by optimally deleting transformer blocks (``block removal'') as a constrained binary optimization (CBO) problem that can be mapped to a physical system (Ising glass), whose energies are a strong proxy for downstream model performance. This formulation enables an efficient ranking of a large number of candidate block-removal configurations yielding many high-quality, non-trivial solutions beyond those only removing consecutive regions. Our method performs strongly in the deep compression regime, such as for 50% compression of Llama-3.3-70B-Instruct, where we achieve an almost 23 percentage point increase on the MMLU benchmark compared to other state-of-the-art (SOTA) block-removal methods. For lighter compression, it performs on par with those methods across several benchmarks for Llama-3.1-8B-Instruct, Qwen3-14B (both before and after retraining), as well as Llama-3.3-70B-Instruct. The approach is computationally efficient and requires only forward and backward passes on a calibration dataset for a few active parameters. Additionally, we demonstrate that using good heuristic solvers for the CBO problem provides solutions that perform well on downstream tasks in negligible runtime when it is unfeasible to solve the problem exactly. The method can be readily applied to any architecture. We illustrate this generality on the recent NVIDIA-Nemotron-3-Nano-30B-A3B-FP8 model, which exhibits a highly inhomogeneous and challenging block structure, and where we outperform SOTA for AIME25 and GPQA when removing either 2 attention layers or 3 mixture-of-experts layers.

URL PDF HTML ☆

赞 0 踩 0

2512.12850 2026-06-18 cs.AR cs.LG cs.SY eess.SY hep-ex 版本更新

KANELÉ: Kolmogorov-Arnold Networks for Efficient LUT-based Evaluation

KANELÉ：基于Kolmogorov-Arnold网络的高效LUT评估

Duc Hoang, Aarush Gupta, Philip Harris

发表机构 * Massachusetts Institute of Technology（麻省理工学院）

AI总结提出KANELÉ框架，利用Kolmogorov-Arnold网络（KAN）的独特性质，通过量化与剪枝协同优化，首次系统实现FPGA上的高效LUT映射，相比先前方法加速高达2700倍并节省大量资源。

Comments International Symposium on Field-Programmable Gate Arrays 2026 (ISFPGA'2026)

详情

DOI: 10.1145/3748173.3779202

AI中文摘要

低延迟、资源高效的FPGA神经网络推理对于需要实时能力和低功耗的应用至关重要。基于查找表（LUT）的神经网络是一种常见解决方案，结合了强大的表示能力和高效的FPGA实现。在这项工作中，我们介绍了KANELÉ，一个利用Kolmogorov-Arnold网络（KAN）独特性质进行FPGA部署的框架。与传统的多层感知器（MLP）不同，KAN使用可学习的一维样条作为边缘激活函数，其域固定，这种结构天然适合离散化和高效的LUT映射。我们提出了第一个在FPGA上实现KAN的系统设计流程，通过量化与剪枝协同优化训练，以实现紧凑、高吞吐量和低延迟的KAN架构。我们的结果表明，与先前的KAN-on-FPGA方法相比，加速高达2700倍，并节省了数量级的资源。此外，KANELÉ在广泛使用的基准测试中匹配或超越了其他基于LUT的架构，特别是在涉及符号或物理公式的任务中，同时平衡了FPGA硬件上的资源使用。最后，我们通过将框架扩展到实时、高能效的控制系统，展示了其多功能性。

英文摘要

Low-latency, resource-efficient neural network inference on FPGAs is essential for applications demanding real-time capability and low power. Lookup table (LUT)-based neural networks are a common solution, combining strong representational power with efficient FPGA implementation. In this work, we introduce KANELÉ, a framework that exploits the unique properties of Kolmogorov-Arnold Networks (KANs) for FPGA deployment. Unlike traditional multilayer perceptrons (MLPs), KANs employ learnable one-dimensional splines with fixed domains as edge activations, a structure naturally suited to discretization and efficient LUT mapping. We present the first systematic design flow for implementing KANs on FPGAs, co-optimizing training with quantization and pruning to enable compact, high-throughput, and low-latency KAN architectures. Our results demonstrate up to a 2700x speedup and orders of magnitude resource savings compared to prior KAN-on-FPGA approaches. Moreover, KANELÉ matches or surpasses other LUT-based architectures on widely used benchmarks, particularly for tasks involving symbolic or physical formulas, while balancing resource usage across FPGA hardware. Finally, we showcase the versatility of the framework by extending it to real-time, power-efficient control systems.

URL PDF HTML ☆

赞 0 踩 0

2602.02056 2026-06-18 cs.AR cs.LG cs.SY eess.SY stat.ML 版本更新

Ultrafast On-chip Online Learning via Spline Locality in Kolmogorov-Arnold Networks

基于Kolmogorov-Arnold网络中样条局部性的超快片上在线学习

Duc Hoang, Aarush Gupta, Philip Harris

发表机构 * MIT（麻省理工学院）

AI总结针对量子计算和核聚变控制等高频系统对亚微秒级在线学习的需求，提出利用Kolmogorov-Arnold网络的B样条局部性实现稀疏更新和固定点量化鲁棒性，在FPGA上实现比MLP更高效、更具表达力的超快在线学习。

Comments Forty-Third International Conference on Machine Learning (ICML'26)

详情

RUB: 评估未学习模型中的残留知识

Hao Xuan, Xingyu Li

发表机构 * Electrical and Computer Engineering University of Alberta（电气与计算机工程大学阿尔伯塔大学）

AI总结提出鲁棒未学习原则及统一基准RUB，通过未学习映射攻击（UMA）检测残留信息，揭示现有方法在对抗评估下的脆弱性。

详情

Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2026, pages 8550-8559

AI中文摘要

机器未学习（MUL）已成为隐私保护和内容监管的关键机制，然而当前技术往往无法保证完全移除敏感信息。虽然现有工作大多关注验证未学习的执行，但它们忽略了模型在面对对抗性恢复遗忘知识尝试时是否保持鲁棒性的关键问题。在这项工作中，我们倡导鲁棒未学习原则，要求模型既与重新训练的模型不可区分，又能抵御多样化的对抗威胁。为实例化这一原则，我们提出了一个统一基准RUB（鲁棒未学习基准），系统评估未学习算法在分类、图像到图像重建和文本到图像合成中的鲁棒性。在此框架内，我们引入未学习映射攻击（UMA）作为检测残留信息的通用方法，并展示现有攻击策略如何适应此框架，只要它们符合通用UMA框架。我们在判别式和生成式任务上的实验表明，最先进的未学习方法在这些评估下仍然脆弱，即使通过了标准验证指标。通过将鲁棒性定位为核心标准并提供对抗评估基准，我们希望RUB能为更可靠和安全的未学习实践铺平道路。RUB中的代码库和模型检查点将公开发布。

英文摘要

Machine Unlearning (MUL) has emerged as a key mechanism for privacy protection and content regulation, yet current techniques often fail to guarantee the complete removal of sensitive information. While most existing works focus on verifying the execution of unlearning, they overlook the critical question of whether models remain robust against adversarial attempts to recover forgotten knowledge. In this work, we advocate for the principle of Robust Unlearning, which requires models to be both indistinguishable from retrained counterparts and resilient against diverse adversarial threats. To instantiate this principle, we propose a unified benchmark, RUB (Robust Unlearning Benchmark), that systematically evaluates the robustness of unlearning algorithms across classification, image-to-image reconstruction, and text-to-image synthesis. Within this framework, we introduce the Unlearning Mapping Attack (UMA) as a generalizable method to detect residual information, and demonstrate how existing attack strategies can be adapted into this framework as long as they conform to the generic UMA framework. Our experiments across discriminative and generative tasks reveal that state-of-the-art unlearning methods remain vulnerable under these evaluations, even when passing standard verification metrics. By positioning robustness as the central criterion and providing a benchmark for adversarial evaluation, we hope RUB paves the way toward more reliable and secure unlearning practices. The codebase and model checkpoints in RUB will be published.

URL PDF HTML ☆

赞 0 踩 0

2505.03646 2026-06-18 cs.LG cs.AI cs.CV 版本更新

Revealing Hidden Vulnerabilities in Autoencoders through Gradient Signal Restoration

通过梯度信号恢复揭示自编码器中的隐藏漏洞

Chethan Krishnamurthy Ramanaik, Arjun Roy, Tobias Callies, Eirini Ntoutsi

发表机构 * University of the Bundeswehr Munich（联邦国防军理工大学）

AI总结针对自编码器对抗攻击中梯度消失导致鲁棒性被高估的问题，提出GRILL框架恢复梯度信号，显著提升攻击效果，暴露隐藏漏洞。

详情

AI中文摘要

深度自编码器（AE）的对抗鲁棒性受到的关注远少于判别模型，尽管其压缩的潜在表示会导致病态映射，从而放大小的输入扰动并破坏重建稳定性。现有的AE白盒攻击通过优化范数有界的对抗扰动以最大化重建损失，往往收敛到次优扰动，从而可能高估AE的鲁棒性。我们表明，这种限制与通过病态层反向传播时对抗损失梯度消失有关，这些病态层的中间权重矩阵具有接近零的奇异值。为了解决这个问题，我们提出了GRILL（病态层中的梯度信号恢复）框架，旨在减轻梯度退化并提高编码器-解码器架构中对抗鲁棒性评估的可靠性。GRILL旨在缓解优化过程中的对抗梯度退化，使攻击能够在固定范数约束下更好地逼近高失真扰动。通过在多种AE架构上的广泛实验，包括样本特定和通用攻击，以及标准和自适应攻击设置，我们表明GRILL显著提高了攻击有效性，从而暴露了现有攻击限制所隐藏的漏洞。除了AE之外，我们提供了初步证据表明现代多模态编码器-解码器架构也存在类似的漏洞。

英文摘要

Adversarial robustness of deep autoencoders (AEs) has received less attention than that of discriminative models, although their compressed latent representations induce ill-conditioned mappings that can amplify small input perturbations and destabilize reconstructions. Existing white-box attacks for AEs, which optimize norm-bounded adversarial perturbations to maximize reconstruction damage, often converge to suboptimal perturbations, thereby potentially overstating AE robustness. We show that this limitation is linked to vanishing adversarial loss gradients during backpropagation through ill-conditioned layers, associated with near-zero singular values in their intermediate weight matrices. To address this, we propose GRILL (Gradient Signal Restoration in Ill-Conditioned Layers), a framework designed to mitigate gradient degradation and improve the reliability of adversarial robustness evaluation in encoder-decoder architectures. GRILL is designed to mitigate adversarial gradient degradation during optimization, enabling attacks to better approximate high-distortion perturbations under fixed norm constraints. Through extensive experiments across multiple AE architectures, under both sample-specific and universal attacks, as well as standard and adaptive attack settings, we show that GRILL significantly increases attack effectiveness, thereby exposing vulnerabilities hidden by existing attack limitations. Beyond AEs, we provide preliminary evidence that modern multimodal encoder-decoder architectures exhibit similar vulnerabilities.

URL PDF HTML ☆

赞 0 踩 0

2606.16214 2026-06-18 cs.LG cs.AI 版本更新

Calibrated Sampling-Free Uncertainty Estimation in Bayesian Deep Learning

贝叶斯深度学习中的校准无采样不确定性估计

Tobias Jan Wieczorek, Leon de Andrade, Thomas Möllenhoff, Marcus Rohrbach

发表机构 * TU Darmstadt & hessian.AI, Darmstadt, Germany（达姆施塔特工业大学 & hessian.AI，德国达姆施塔特）； RIKEN Center for Advanced Intelligence Project, Tokyo, Japan（日本理化学研究所革新智能研究中心，日本东京）

AI总结提出校准方差传播（CVP），通过新型归一化层传播方法、激活函数处理技术及轻量校准步骤，在单次前向传播中高效估计不确定性，在Transformer和CNN上达到与MC采样相当的精度，成本显著降低。

详情

AI中文摘要

现代深度学习模型仍然以过度自信而闻名，限制了它们在高风险应用中的可靠性。贝叶斯方法通过学习模型参数的分布来应对这一问题，最近的进展使得在大规模架构上以与AdamW相当的成本实现这一目标成为可能。然而，测试时仍存在一个挑战：预测必须对从后验中采样的权重进行多次前向传播的平均，这代价高昂。方差传播提供了一种高效的替代方案，在单次前向传播中计算每层不确定性的解析近似。虽然此类技术对MLP有效，但由于现代架构的深度增加和层类型多样性，其扩展仍然具有挑战性。为填补这一空白，我们提出了校准方差传播（CVP），它引入了一种新的归一化层传播方法，结合了处理激活函数的近期技术，并通过轻量校准步骤吸收残差误差。CVP在Transformer和CNN上产生与MC采样相当准确的不确定性估计，而成本仅为极小部分。与先前的方差传播工作相比，CVP在BEiT-3上对视觉推理（NLVR2）的$0.5\%$风险覆盖率从$8.2\%$提高到$14.6\%$，在ViLT上对VQAv2从$2.6\%$提高到$10.8\%$，且增益扩展到卷积架构。

英文摘要

Modern deep learning models remain notoriously prone to overconfidence, limiting their reliability in high-stakes applications. Bayesian methods aim to counter this by learning a distribution over model parameters, and recent advances now make this feasible for large-scale architectures at costs comparable to AdamW. However, a challenge remains at test time: predictions must be averaged across many forward passes with weights sampled from the posterior, which is prohibitively expensive. Variance propagation offers an efficient alternative, computing layer-wise analytical approximations of uncertainty in a single forward pass. While such techniques are effective for MLPs, their extension to modern architectures remains challenging, due to increased depth and diversity of layer types. To fill this gap, we propose Calibrated Variance Propagation (CVP), which introduces a new propagation method for normalization layers, combines it with recent techniques for handling activation functions, and absorbs residual error through a light calibration step. CVP yields comparably accurate uncertainty estimates to MC sampling across transformers and CNNs, at a fraction of the cost. Against prior variance propagation work, CVP improves coverage at $0.5\%$ risk from $8.2\%$ to $14.6\%$ with BEiT-3 on Visual Reasoning (NLVR2) and from $2.6\%$ to $10.8\%$ with ViLT on VQAv2, with gains extending to convolutional architectures.

URL PDF HTML ☆

赞 0 踩 0

2508.02158 2026-06-18 cs.IT cs.CR cs.DS cs.LG math.IT math.ST stat.TH 版本更新

Robust Detection of Planted Subgraphs in Semi-Random Models

半随机模型中植入子图的鲁棒检测

Dor Elimelech, Wasim Huleihel

AI总结研究半随机模型下植入子图检测问题，证明存在对抗者时强次对数密度子图检测在信息论上不可能，而对数以上密度子图统计极限不变，并设计了高效鲁棒检测算法。

Comments 38 pages, 2 figures

详情

AI中文摘要

在Erdös-Rényi随机图中检测植入子图已被广泛研究，产生了丰富的刻画统计和计算阈值的结果。然而，大多数先前的工作假设纯随机生成模型，使得所得算法在面对现实扰动时可能脆弱。本文开创性地研究了植入子图检测问题的半随机模型，其中允许对抗者在图被揭示给统计学家之前移除植入子图外的边。关键的是，统计学家仍然不知道哪些边被移除，这给推理任务带来了根本性挑战。我们建立了该半随机模型下检测的基本统计极限，揭示了尖锐的二分性。具体而言，对于具有强次对数最大密度的植入子图，在存在对抗者的情况下检测在信息论上变得不可能——尽管在经典随机模型中某些植入子图是可能的。与此形成鲜明对比的是，对于具有超对数密度的子图，统计极限基本保持不变；我们证明最优（尽管计算上不可行）的似然比检验仍然是鲁棒的。在这些统计边界之外，我们设计了一种新的计算高效且鲁棒的检测算法，并为其性能提供了严格的统计保证。我们的结果为植入子图检测建立了第一个鲁棒框架，并为半随机模型、计算-统计权衡和图推理问题中的鲁棒性研究开辟了新方向。

英文摘要

Detection of planted subgraphs in Erdös-Rényi random graphs has been extensively studied, leading to a rich body of results characterizing both statistical and computational thresholds. However, most prior work assumes a purely random generative model, making the resulting algorithms potentially fragile in the face of real-world perturbations. In this work, we initiate the study of semi-random models for the planted subgraph detection problem, wherein an adversary is allowed to remove edges outside the planted subgraph before the graph is revealed to the statistician. Crucially, the statistician remains unaware of which edges have been removed, introducing fundamental challenges to the inference task. We establish fundamental statistical limits for detection under this semi-random model, revealing a sharp dichotomy. Specifically, for planted subgraphs with strongly sub-logarithmic maximum density detection becomes information-theoretically impossible in the presence of an adversary-despite being possible for some planted subgraphs in the classical random model. In stark contrast, for subgraphs with super-logarithmic density, the statistical limits remain essentially unchanged; we prove that the optimal (albeit computationally intractable) likelihood ratio test remains robust. Beyond these statistical boundaries, we design a new computationally efficient and robust detection algorithm, and provide rigorous statistical guarantees for its performance. Our results establish the first robust framework for planted subgraph detection and open new directions in the study of semi-random models, computational-statistical trade-offs, and robustness in graph inference problems.

URL PDF HTML ☆

赞 0 踩 0

2602.21160 2026-06-18 stat.ML cs.LG stat.AP stat.ME 版本更新

Not Just How Much, But Where: Decomposing Epistemic Uncertainty into Per-Class Contributions

不仅多少，而且何处：将认知不确定性分解为每类贡献

Mame Diarra Toure, David A. Stephens

发表机构 * Department of Mathematics and Statistics（数学与统计学系）

AI总结针对安全关键分类中认知不确定性度量无法区分类别的问题，提出将互信息分解为每类向量$C_k$，通过二阶泰勒展开和$1/\mu_k$加权校正边界抑制，在糖尿病视网膜病变选择性预测、分布外检测和标签噪声研究中验证其有效性。

Comments 8 pages, 17 figures Accepted at UAI 2026

详情

Journal ref: Forty-Second Annual Conference on Uncertainty in Artificial Intelligence}, year={2026}, url={https://openreview.net/forum?id=cxuWscJmAr}

AI中文摘要

在安全关键分类中，失败的代价往往是不对称的，然而贝叶斯深度学习用单个标量——互信息（MI）来总结认知不确定性，这无法区分模型的无知涉及良性类别还是安全关键类别。我们将MI分解为每类向量$C_k(x)=\sigma_k^{2}/(2\mu_k)$，其中$\mu_k{=}\mathbb{E}[p_k]$，$\sigma_k^2{=}\mathrm{Var}[p_k]$，计算基于后验样本。该分解来自熵的二阶泰勒展开；$1/\mu_k$加权校正了边界抑制，使$C_k$在稀有类别和常见类别之间具有可比性。根据构造，$\sum_k C_k \approx \mathrm{MI}$，并且伴随的偏度诊断标志可识别近似退化的输入。在刻画$C_k$的公理性质后，我们在三个任务上验证了它：（i）糖尿病视网膜病变的选择性预测，其中关键类别的$C_k$相比MI降低了34.7%的选择性风险，相比方差基线降低了56.2%；（ii）临床和图像基准上的分布外检测，其中$\sum_k C_k$取得了最高的AUROC，并且每类视角暴露了MI无法察觉的不对称偏移；（iii）受控的标签噪声研究，其中在端到端贝叶斯训练下，$\sum_k C_k$对注入的偶然噪声的敏感性低于MI，而在迁移学习下两种度量均退化。在所有任务中，后验近似的质量对不确定性的影响至少与度量选择本身一样强，这表明不确定性如何通过网络传播与其如何被度量同等重要。

英文摘要

In safety-critical classification, the cost of failure is often asymmetric, yet Bayesian deep learning summarises epistemic uncertainty with a single scalar, mutual information (MI), that cannot distinguish whether a model's ignorance involves a benign or safety-critical class. We decompose MI into a per-class vector $C_k(x)=σ_k^{2}/(2μ_k)$, with $μ_k{=}\mathbb{E}[p_k]$ and $σ_k^2{=}\mathrm{Var}[p_k]$ across posterior samples. The decomposition follows from a second-order Taylor expansion of the entropy; the $1/μ_k$ weighting corrects boundary suppression and makes $C_k$ comparable across rare and common classes. By construction $\sum_k C_k \approx \mathrm{MI}$, and a companion skewness diagnostic flags inputs where the approximation degrades. After characterising the axiomatic properties of $C_k$, we validate it on three tasks: (i) selective prediction for diabetic retinopathy, where critical-class $C_k$ reduces selective risk by 34.7\% over MI and 56.2\% over variance baselines; (ii) out-of-distribution detection on clinical and image benchmarks, where $\sum_k C_k$ achieves the highest AUROC and the per-class view exposes asymmetric shifts invisible to MI; and (iii) a controlled label-noise study in which $\sum_k C_k$ shows less sensitivity to injected aleatoric noise than MI under end-to-end Bayesian training, while both metrics degrade under transfer learning. Across all tasks, the quality of the posterior approximation shapes uncertainty at least as strongly as the choice of metric, suggesting that how uncertainty is propagated through the network matters as much as how it is measured.

URL PDF HTML ☆

赞 0 踩 0

2504.04739 2026-06-18 cs.LG cs.CY 版本更新

UST-GNN: A Unified Spatial--Topological Graph Neural Network Framework for Urban Analytics--Demonstrated through a Case Study on Urban Health Prediction

UST-GNN：面向城市分析的空间-拓扑统一图神经网络框架——以城市健康预测为例

Minwei Zhao, Sanja Scepanovic, Stephen Law, Ivica Obadic, Cai Wu, Daniele Quercia

发表机构 * University College London（伦敦大学学院）； The Hong Kong University of Science（香港科学大学）； Nokia Bell Labs（诺基亚贝尔实验室）； Technical University of Munich（慕尼黑技术大学）； University of Oxford（牛津大学）

AI总结提出UST-GNN框架，整合邻域连通性、异质城市特征和位置嵌入，在大伦敦4835个邻域的健康预测中，严格空间交叉验证下R²提升8.4-13.2%，并引入主成分模块解释嵌入。

详情

AI中文摘要

理解社会、人口、环境与空间因素如何共同塑造城市结果，对于可持续城市发展和循证政策至关重要。传统统计方法往往难以捕捉复杂的非线性关系，而许多机器学习方法忽视了城市系统中空间自相关和网络拓扑的共同作用。近期GeoAI的进展仅部分解决了这些挑战，通常将空间效应、图结构、评估和可解释性分开处理。我们提出\textbf{UST-GNN}，一个统一的空间-拓扑图神经网络框架，将邻域连通性、异质城市特征和位置/区位嵌入整合到单一表示中。使用MedSAT数据集（包含大伦敦4835个邻域的150多个环境和社会人口变量及六种处方结果），UST-GNN在严格空间交叉验证下，比强统计基线、地理增强基线和图机器学习基线表现更优，样本外$R^2$提升8.4-13.2%。我们进一步引入轻量级主成分模块，从地理角度解释学习到的节点嵌入，并将其与政策相关的协变量联系起来。结果分析恢复了已知模式，为有争议的关联提供了新视角，并揭示了值得进一步因果研究的新预测因子。这些发现共同证明了基于图的空间机器学习在城市健康分析、环境不平等评估和循证城市政策中的价值。除预测增益外，UST-GNN提供了一个统一的GeoAI分析流程，可嵌入城市数字孪生工作流，用于情景测试、监测和数据驱动的决策，以建设更健康、更可持续的城市。

英文摘要

Understanding how social, demographic, environmental, and spatial factors jointly shape urban outcomes is essential for sustainable urban development and evidence-based policy. Traditional statistical approaches often struggle to capture complex non-linear relationships, while many machine learning methods overlook the joint roles of spatial autocorrelation and network topology in urban systems. Recent advances in GeoAI have addressed these challenges only partially, often treating spatial effects, graph structure, evaluation, and interpretability separately. We present \textbf{UST-GNN}, a unified spatial--topological graph neural network framework that integrates neighbourhood connectivity, heterogeneous urban features, and positional/locational embeddings into a single representation. Using the MedSAT dataset, which contains over 150 environmental and socio-demographic variables and six prescription outcomes across 4,835 neighbourhoods in Greater London, UST-GNN outperforms strong statistical, geographically enhanced, and graph Machine Learning baselines, improving out-of-sample $R^2$ by 8.4--13.2\% under strict spatial cross-validation. We further introduce a lightweight principal-component module to interpret learned node embeddings geographically and relate them to policy-relevant covariates. The resulting analyses recover established patterns, offer new perspectives on debated associations, and reveal novel predictors warranting further causal investigation. Together, these findings demonstrate the value of graph-based spatial machine learning for urban health analytics, environmental inequality assessment, and evidence-based urban policy. Beyond predictive gains, UST-GNN provides a unified GeoAI analytical pipeline that can be embedded into urban digital twin workflows for scenario testing, monitoring, and data-informed decision-making for healthier, more sustainable cities.

URL PDF HTML ☆

赞 0 踩 0

2606.15633 2026-06-18 cs.LG 版本更新

Formalizing and Mitigating Structural Distortion in LLM Attention for Graph Reasoning

形式化并缓解大语言模型注意力中的结构失真以实现零样本图推理

Donald Loveland, Puja Trivedi, Ari Weinstein, Edward W Huang, Danai Koutra

发表机构 * University of Michigan（密歇根大学）； Amazon（亚马逊）

AI总结本文形式化了大语言模型处理文本属性图时因图线性化导致的结构失真机制，并提出轻量级推理时修改方法GaLA，通过校正注意力偏差提升零样本图推理性能。

Comments Accepted to KDD 2026

详情

AI中文摘要

大语言模型（LLM）在文本属性图（TAG）推理中展现出潜力。然而，将LLM应用于图需要将其结构线性化为序列，这引入了根源于图带宽问题的失真。虽然这种失真已被证明会降低性能，但通常归因于提示设计或模型规模，其潜在机制尚不清楚。在这项工作中，我们展示了旋转位置嵌入如何将图线性化为带宽相关的注意力衰减，抑制了序列化序列中被强制分隔开的图相邻节点之间的注意力。这将基于LLM的图推理的焦点从提示工程和规模缩放转向纠正注意力错位。受此分析启发，我们提出了图对齐语言注意力（GaLA），一种轻量级的、推理时修改LLM的方法。GaLA将注意力偏向图相邻节点，同时保留LLM的序列归纳偏差。在TAG基准测试中，GaLA以可忽略的开销提升了性能，表明失真是基于LLM的图推理中可纠正的瓶颈。

英文摘要

Large Language Models (LLMs) have shown promise for reasoning over Text-Attributed Graphs (TAGs). However, applying LLMs to graphs requires linearizing their structure into sequences, introducing distortion rooted in the graph bandwidth problem. While this distortion has been shown to degrade performance, it is often attributed to prompt design or model scale, leaving the underlying mechanism unclear. In this work, we show \textit{how} rotary positional embeddings turn graph linearization into bandwidth-dependent attention decay, suppressing attention between graph-adjacent nodes that are forced far apart in the serialized sequence. This shifts the focus of LLM-based graph reasoning from prompt engineering and scaling toward correcting attention misalignment. Motivated by this analysis, we propose \textbf{G}raph-\textbf{a}ligned \textbf{L}anguage \textbf{A}ttention (\textbf{GaLA}), a lightweight, inference-time modification for LLMs. GaLA biases attention toward graph-adjacent nodes while preserving the LLM's sequential inductive biases. Across TAG benchmarks, GaLA improves performance with negligible overhead, demonstrating that distortion is a correctable bottleneck in LLM-based graph reasoning.

URL PDF HTML ☆

赞 0 踩 0

2505.12369 2026-06-18 cs.AI cs.LG cs.LO 版本更新

Fully Geometric Multi-Hop Reasoning on Knowledge Graphs with Transitive Relations

知识图谱上具有传递关系的全几何多跳推理

Fernando Zhapa-Camacho, Robert Hoehndorf

发表机构 * KAUST Center of Excellence for Smart Health (KCSH)（智能健康卓越中心）； KAUST Center of Excellence for Generative AI（生成人工智能卓越中心）

AI总结提出GeometrE方法，将逻辑操作映射为纯几何变换，并引入传递损失函数，在保持可解释性的同时提升多跳推理性能。

Comments Accepted at ESWC 2026

详情

DOI: 10.1007/978-3-032-25156-5_14
Journal ref: The Semantic Web. ESWC 2026. Lecture Notes in Computer Science, vol 16549. Springer, Cham (2026)

AI中文摘要

知识图谱上的多跳逻辑推理需要将逻辑语义忠实地映射到潜在空间。当前的几何嵌入方法通过将实体映射到几何区域、逻辑操作映射到潜在变换，在此任务上表现出有效性。虽然几何嵌入可以为查询回答提供直接的可解释性框架，但当前方法仅利用了实体的几何构造，未能将逻辑操作映射为纯几何变换，而是使用神经组件来学习这些操作。另一方面，纯神经方法优于几何方法，但在潜在空间中缺乏可解释性。我们提出了GeometrE，一种用于多跳推理的几何嵌入方法，它将每个逻辑操作映射为潜在空间中的纯几何操作。此外，我们引入了一个传递损失函数，并表明与现有方法不同，它可以保留对所有a,b,c的逻辑规则：r(a,b)和r(b,c) -> r(a,c)。我们的实验表明，GeometrE优于当前最先进的几何方法，并在标准基准数据集上与现有的神经方法保持竞争力。

英文摘要

Multi-hop logical reasoning on knowledge graphs requires faithfully mapping the logical semantics to latent space. Current geometric embedding methods show to be useful on this task by mapping entities to geometric regions and logical operations to latent transformations. While a geometric embedding can provide a direct interpretability framework for query answering, current methods have only leveraged the geometric construction of entities, failing to map logical operations to pure geometric transformations and, instead, using neural components to learn these operations. On the other hand, purely neural-based methods outperform geometric methods, but they lack interpretability in the latent space. We introduce GeometrE, a geometric embedding method for multi-hop reasoning, that maps every logical operation to a purely geometric operation in the latent space. Additionally, we introduce a transitive loss function and show that, unlike existing methods, it can preserve the logical rule for all a,b,c: r(a,b) and r(b,c) -> r(a,c). Our experiments show that GeometrE outperforms current state-of-the-art geometric methods and remains competitive with existing neural-based methods on standard benchmark datasets.

URL PDF HTML ☆

赞 0 踩 0

2506.14126 2026-06-18 cs.LG cs.AI 版本更新

From Memorization to Parameter Interference: How Overtraining Experts Harms Model Merging

从记忆到参数干扰：过度训练专家如何损害模型合并

Stefan Horoi, Guy Wolf, Eugene Belilovsky, Gintare Karolina Dziugaite

发表机构 * Concordia University（康科德大学）； Mila -- Québec AI Institute（魁北克人工智能研究所）； Google DeepMind（谷歌深Mind）

AI总结本文研究专家模型微调过度对模型合并的影响，发现长时间微调导致记忆困难样本，造成参数干扰，降低合并性能，并提出任务相关的早停策略改善合并效果。

Comments Proceedings of the 43rd International Conference on Machine Learning, Seoul, South Korea. PMLR 306, 2026

详情

AI中文摘要

现代深度学习日益以使用开放权重基础模型为特征，这些模型可以在专门数据集上进行微调。这导致了专家模型和适配器的激增，通常通过HuggingFace和AdapterHub等平台共享。模型合并最近成为一种有效利用这些现有资源的方法，使得能够组合不同模型检查点的能力。因此，形成了一种自然的流程来利用迁移学习的好处并分摊沉没训练成本：模型在通用数据上预训练，在特定任务上微调，然后合并多个检查点以获得更强大的模型。一个普遍假设是，该流程中某一阶段的改进会向下游传播，从而在后续步骤中带来收益。在这项工作中，我们通过研究专家微调如何影响模型合并来挑战这一假设。我们表明，针对个体性能优化的专家长时间微调会导致跨视觉和语言模态、多种模型规模以及完全微调和LoRA适配模型的合并性能下降。我们将这种退化追溯到对一小部分困难样本的记忆，这些样本主导了微调后期步骤。这会导致负参数干扰，并编码在合并过程中被遗忘的知识。最后，我们证明任务相关的激进早停策略可以显著改善模型合并性能。

英文摘要

Modern deep learning is increasingly characterized by the use of open-weight foundation models that can be fine-tuned on specialized datasets. This has led to a proliferation of expert models and adapters, often shared via platforms like HuggingFace and AdapterHub. Model merging has recently emerged as an effective way to leverage these existing resources, enabling the composition of capabilities from different model checkpoints. A natural pipeline has thus formed to harness the benefits of transfer learning and amortize sunk training costs: models are pre-trained on general data, fine-tuned on specific tasks, and then multiple checkpoints are merged to obtain a more capable model. A prevailing assumption is that improvements at one stage of this pipeline propagate downstream, leading to gains at subsequent steps. In this work, we challenge that assumption by examining how expert fine-tuning affects model merging. We show that long fine-tuning of experts that optimizes for their individual performance leads to degraded merging performance across vision and language modalities, multiple model scales, and both fully fine-tuned and LoRA-adapted models. We trace this degradation to the memorization of a small set of difficult examples that dominate late fine-tuning steps. This causes negative parameter interference and encodes knowledge that is forgotten during merging. Finally, we demonstrate that task-dependent aggressive early stopping strategies can significantly improve model merging performance.

URL PDF HTML ☆

赞 0 踩 0

2602.09234 2026-06-18 cs.LG cs.AI 版本更新

Do Neural Networks Lose Plasticity in a Gradually Changing World?

神经网络在渐变世界中会失去可塑性吗？

Tianhui Liu, Lili Mou

发表机构 * Dept. Computing Science \& Alberta Machine Intelligence Institute (Amii), University of Alberta ； Canada CIFAR AI Chair

AI总结研究任务转换的突然性对神经网络可塑性损失的影响，通过输入/输出插值和任务采样模拟渐变环境，理论和实验表明可塑性损失严重程度与任务转换突然性密切相关，渐变环境下可显著减轻。

2303.18031 2026-06-18 cs.CV cs.AI cs.LG 版本更新

Simple Domain Generalization Methods are Strong Baselines for Open Domain Generalization

简单域泛化方法是开放域泛化的强基线

Masashi Noguchi, Shinichi Shirakawa

发表机构 * Graduate School of Environment and Information Sciences（环境与信息科学研究生院）； Yokohama National University（Yokohama国立大学）； Faculty of Environment（环境学系）

AI总结本文评估现有域泛化方法在开放域泛化中的表现，发现简单方法CORAL和MMD与复杂方法DAML竞争力相当，并通过集成学习和Dirichlet混合数据增强简单扩展后性能接近DAML且计算成本更低。

Comments Accepted at IJCNN 2024. The code used in the experiments is available at https://github.com/shiralab/OpenDG-Eval

详情

DOI: 10.1109/IJCNN60899.2024.10650639

AI中文摘要

在现实应用中，机器学习模型需要处理开放集识别（OSR），即在推理过程中出现未知类别，同时还要处理域偏移，即训练和推理阶段数据分布不同。域泛化（DG）旨在处理推理阶段目标域在模型训练期间不可访问的域偏移情况。开放域泛化（ODG）同时考虑DG和OSR。域增强元学习（DAML）是一种针对ODG的方法，但其学习过程复杂。相比之下，尽管已提出多种DG方法，但它们尚未在ODG场景下进行评估。在本研究中，我们全面评估了现有DG方法在ODG中的表现，并表明两种简单的DG方法——相关对齐（CORAL）和最大均值差异（MMD）——在多种情况下与DAML具有竞争力。此外，我们通过引入DAML中使用的技术（如集成学习和Dirichlet混合数据增强）提出了CORAL和MMD的简单扩展。实验评估表明，扩展后的CORAL和MMD可以以较低的计算成本达到与DAML相当的性能。这表明简单的DG方法及其简单扩展是ODG的强基线。

英文摘要

In real-world applications, a machine learning model is required to handle an open-set recognition (OSR), where unknown classes appear during the inference, in addition to a domain shift, where the data distribution differs between the training and inference phases. Domain generalization (DG) aims to handle the domain shift situation where the target domain of the inference phase is inaccessible during the model training. Open domain generalization (ODG) considers DG and OSR. Domain-augmented meta-learning (DAML) is a method targeting ODG; however, it has a complicated learning process. By contrast, although various DG methods have been proposed, they have not been evaluated in ODG situations. In this study, we comprehensively evaluate the existing DG methods in ODG and show that the two simple DG methods, CORrelation ALignment (CORAL) and maximum mean discrepancy (MMD), are competitive with DAML in several cases. In addition, we propose simple extensions of CORAL and MMD by introducing the techniques used in DAML, such as ensemble learning and Dirichlet mixup data augmentation. The experimental evaluation demonstrates that the extended CORAL and MMD can perform comparably to DAML with lower computational costs. This suggests that the simple DG methods and their simple extensions are strong baselines for ODG.

URL PDF HTML ☆

赞 0 踩 0

2510.15551 2026-06-18 cs.CL cs.AI cs.LG 版本更新

Rethinking Cross-lingual Gaps from a Statistical Viewpoint

从统计视角重新思考跨语言差距

Vihari Piratla, Purvam Jain, Darshan Singh, Trevor Cohn, Preethi Jyothi, Partha Talukdar

发表机构 * Google DeepMind（谷歌深Mind）

AI总结提出跨语言差距源于目标语言响应方差，通过形式化偏差和无偏误差，并采用推理时集成方法降低方差，使跨语言迁移得分提升8%-50%以上。

Comments 30 pages

详情

AI中文摘要

任何知识片段通常以一种或少数几种自然语言表达在网页或大型语料库中。大型语言模型（LLMs）通过从源语言获取知识，并在使用目标语言查询时使其可访问，从而充当桥梁。跨语言差距是指使用目标语言而非源语言查询知识时准确率的下降。现有研究侧重于导致跨语言差距的建模或训练失败。在这项工作中，我们采取另一种视角来表征跨语言错误的性质，并假设目标语言中响应的方差是造成这一差距的关键原因。我们首次将跨语言差距形式化为有偏误差和无偏误差。通过多种控制方差并减少跨语言差距的推理时干预，我们实证验证了我们的假设。我们展示了几种测试时集成方法，这些方法降低了响应方差，从而将源-目标迁移得分提高了多达12个绝对百分点，在各种LLMs上实现了8%到超过50%的相对提升。

英文摘要

Any piece of knowledge is usually expressed in one or a handful of natural languages on the web or in any large corpus. Large Language Models (LLMs) act as a bridge by acquiring knowledge from a source language and making it accessible when queried using target languages. A cross-lingual gap is a drop in accuracy incurred when querying knowledge in a target language rather than the source language. Existing research focused on modeling or training failures leading to cross-lingual gaps. In this work, we take an alternative view to characterize the nature of cross-lingual error, and hypothesize that the variance of responses in the target language is a key cause of this gap. For the first time, we formalize the cross-lingual gap in terms of biased and unbiased errors. We empirically validate our hypothesis through multiple inference-time interventions that control variance and reduce the cross-lingual gap. We demonstrate a few test-time ensemble methods that reduce response variance, and thereby improve source-target transfer scores by up to 12 absolute points yielding relative gains of 8% to over 50% across various LLMs.

URL PDF HTML ☆

赞 0 踩 0

2602.17187 2026-06-18 stat.ML cs.LG 版本更新

Anti-causal domain generalization: Leveraging unlabeled data

反因果域泛化：利用无标签数据

Sorawit Saengkyongam, Juan L. Gamella, Andrew C. Miller, Jonas Peters, Nicolai Meinshausen, Christina Heinze-Deml

发表机构 * Apple（苹果公司）； ETH Zürich（苏黎世联邦理工学院）

AI总结针对反因果设置下的域泛化问题，提出利用无标签数据估计环境扰动方向，通过惩罚模型对协变量均值和协方差变化的敏感性实现鲁棒性，并提供最坏情况最优性保证。

Comments Accepted at the International Conference on Machine Learning (ICML) 2026

详情

AI中文摘要

域泛化问题关注的是学习在部署到新的、未见过的环境时对分布变化具有鲁棒性的预测模型。现有方法通常需要来自多个训练环境的标记数据，这在标记数据稀缺时限制了它们的适用性。在这项工作中，我们研究了反因果设置下的域泛化，其中结果导致观察到的协变量。在这种结构下，影响协变量的环境扰动不会传播到结果，这促使我们对模型对这些扰动的敏感性进行正则化。关键在于，估计这些扰动方向不需要标签，使我们能够利用来自多个环境的无标签数据。我们提出了两种方法，分别惩罚模型对跨环境协变量均值和协方差变化的敏感性，并证明这些方法在特定环境类别下具有最坏情况最优性保证。最后，我们在一个受控物理系统和一个生理信号数据集上展示了我们方法的实证性能。

英文摘要

The problem of domain generalization concerns learning predictive models that are robust to distribution shifts when deployed in new, previously unseen environments. Existing methods typically require labeled data from multiple training environments, limiting their applicability when labeled data are scarce. In this work, we study domain generalization in an anti-causal setting, where the outcome causes the observed covariates. Under this structure, environment perturbations that affect the covariates do not propagate to the outcome, which motivates regularizing the model's sensitivity to these perturbations. Crucially, estimating these perturbation directions does not require labels, enabling us to leverage unlabeled data from multiple environments. We propose two methods that penalize the model's sensitivity to variations in the mean and covariance of the covariates across environments, respectively, and prove that these methods have worst-case optimality guarantees under certain classes of environments. Finally, we demonstrate the empirical performance of our approach on a controlled physical system and a physiological signal dataset.

URL PDF HTML ☆

赞 0 踩 0

2406.14399 2026-06-18 cs.LG cs.CV physics.ao-ph stat.ML 版本更新

Benchmarking Physics-Informed Time-Series Models for Operational Global Station Weather Forecasting

面向全球站点业务天气预报的物理信息时间序列模型基准测试

Tao Han, Zhibin Wen, Zhenghao Chen, Dazhao Du, Song Guo, Lei Bai

发表机构 * Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong SAR China（香港科技大学计算机科学与工程系）； Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, China（南方科技大学计算机科学与工程系）； School of Computer and Information Sciences, University of Newcastle, Newcastle, Australia（新castle大学计算机与信息科学学院）； Hangzhou Innovation Institute of Beihang University, Hangzhou, China（北京航空航天大学杭州创新研究院）； Shanghai Artificial Intelligence Laboratory, Shanghai, China（上海人工智能实验室）

AI总结提出大规模观测数据集WEATHER-5K和物理信息模型PhysicsFormer，通过压力-风对齐和能量感知平滑损失增强物理一致性，在多个天气变量和极端事件预测上评估学术模型与业务系统的差距。

Comments Accepted by ICML2026

详情

AI中文摘要

时间序列预测（TSF）模型的发展常受限于缺乏全面的数据集，尤其是在全球站点天气预报（GSWF）中，现有数据集规模小、时间短且空间稀疏。为解决这一问题，我们引入了WEATHER-5K，一个大规模观测天气数据集，能更好地反映真实世界条件，支持改进模型训练和评估。尽管最近的TSF方法在基准测试上表现良好，但在捕捉复杂天气动态和极端事件方面落后于业务数值天气预报系统。我们提出了PhysicsFormer，一种物理信息预测模型，结合动态核心与Transformer残差来预测未来天气状态。通过压力-风对齐和能量感知平滑损失强制物理一致性，确保在捕捉复杂时间模式的同时保持合理的动力学。我们将PhysicsFormer及其他TSF模型与业务系统在多个天气变量、极端事件预测和模型复杂度上进行基准测试，全面评估学术TSF模型与业务预报之间的差距。数据集和基准测试实现可在以下网址获取：this https URL。

英文摘要

The development of Time-Series Forecasting (TSF) models is often constrained by the lack of comprehensive datasets, especially in Global Station Weather Forecasting (GSWF), where existing datasets are small, temporally short, and spatially sparse. To address this, we introduce WEATHER-5K, a large-scale observational weather dataset that better reflects real-world conditions, supporting improved model training and evaluation. While recent TSF methods perform well on benchmarks, they lag behind operational Numerical Weather Prediction systems in capturing complex weather dynamics and extreme events. We propose PhysicsFormer, a physics-informed forecasting model combining a dynamic core with a Transformer residual to predict future weather states. Physical consistency is enforced via pressure-wind alignment and energy-aware smoothness losses, ensuring plausible dynamics while capturing complex temporal patterns. We benchmark PhysicsFormer and other TSF models against operational systems across several weather variables, extreme event prediction, and model complexity, providing a comprehensive assessment of the gap between academic TSF models and operational forecasting. The dataset and benchmark implementation are available at: https://github.com/taohan10200/WEATHER-5K.

URL PDF HTML ☆

赞 0 踩 0

2508.20330 2026-06-18 cs.LG 版本更新

FORGE: Foundational Optimization Representations from Graph Embeddings

FORGE：基于图嵌入的基础优化表示

Zohair Shafi, Serdar Kadioglu

发表机构 * Khoury College of Computer Science Northeastern University（诺埃弗大学计算机科学学院）； AI Center of Excellence, Fidelity Investments（富达投资人工智能卓越中心）； Department of Computer Science, Brown University（布朗大学计算机科学系）

AI总结提出FORGE框架，通过无监督预训练向量量化图自编码器学习混合整数规划实例的通用表示，无需求解器或最优解，在下游任务中提升求解器性能并超越现有方法。

Comments Published in TMLR

详情

AI中文摘要

组合优化问题在科学和工程中无处不在。然而，基于学习的加速组合优化方法通常需要求解大量困难实例来收集训练数据，导致显著的计算成本。现有的学习方法需要为每个问题分布和每个下游任务训练专用模型，严重限制了其可扩展性和泛化能力。我们提出Forge：基于图嵌入的基础优化表示，这是一个框架，它在大规模、多样化的混合整数规划（MIP）实例集合上以无监督方式预训练向量量化图自编码器，不依赖优化求解器或最优解。向量量化产生离散的代码分配，作为表示优化实例的词汇表。我们在无监督和有监督设置下评估Forge。在无监督设置中，Forge嵌入有效聚类跨问题领域和规模的未见实例。在有监督设置中，我们微调Forge嵌入，并展示单个预训练模型有助于预测割生成的完整性差距和搜索指导的变量提示，跨越多个问题和规模分布。在这两个任务中，我们提升了商业优化求解器的性能，并超越了最先进的基于学习的方法。最后，我们开源训练代码、预训练Forge权重和多个MIP分布的嵌入，以促进优化问题表示学习的进一步研究。

英文摘要

Combinatorial optimization problems are ubiquitous in science and engineering. Still, learning-based approaches to accelerate combinatorial optimization often require solving a large number of difficult instances to collect training data, incurring significant computational cost. Existing learning-based methods require training dedicated models for each problem distribution, for each downstream task, severely limiting their scalability and generalization. We introduce Forge: Foundational Optimization Representations from Graph Embeddings, a framework that pre-trains a vector-quantized graph autoencoder on a large, diverse collection of mixed-integer programming (MIP) instances in an unsupervised manner, without relying on optimization solvers or optimal solutions. Vector quantization produces discrete code assignments that serve as a vocabulary for representing optimization instances. We evaluate Forge in both unsupervised and supervised settings. In the unsupervised setting, Forge embeddings effectively cluster unseen instances across problem domains and sizes. In the supervised setting, we fine-tune Forge embeddings and show that a single pre-trained model helps predicting both the integrality gap for cut-generation and variable hints for search guidance across multiple problem and size distributions. In both tasks, we improve the performance of a commercial optimization solver and outperform state-of-the-art learning-based methods. Finally, we open-source our training code, pre-trained Forge weights, and embeddings for multiple MIP distributions to foster further research in representation learning for optimization problems https://skadio.github.io/forge/

URL PDF HTML ☆

赞 0 踩 0

2509.02555 2026-06-18 cs.LG cs.AI cs.NE 版本更新

Surrogate Benchmarks for Model Merging Optimization

模型合并优化的替代基准

Rio Akizuki, Yuya Kudo, Nozomu Yoshinari, Yoichi Hirose, Toshiyuki Nishimoto, Kento Uchida, Shinichi Shirakawa

发表机构 * Yokohama National University（横滨国立大学）

AI总结针对模型合并超参数优化计算成本高的问题，构建替代基准以低成本预测合并模型性能并模拟优化算法行为。

Comments AutoML 2025 Non-Archival Content Track. The code of the surrogate benchmark is available at https://github.com/shiralab/SMM-Bench

详情

AI中文摘要

模型合并技术旨在将多个模型的能力整合到一个模型中。大多数模型合并技术都有超参数，其设置会影响合并模型的性能。由于现有几项工作表明，调整模型合并中的超参数可以增强合并结果，因此为模型合并开发超参数优化算法是一个有前景的方向。然而，其优化过程计算成本高昂，特别是在合并大型语言模型时。在这项工作中，我们为合并超参数的优化开发了替代基准，以实现低成本的算法开发和性能比较。我们定义了两个搜索空间并收集数据样本，以构建替代模型来预测合并模型在给定超参数下的性能。我们证明了我们的基准能够很好地预测合并模型的性能，并模拟优化算法的行为。

英文摘要

Model merging techniques aim to integrate the abilities of multiple models into a single model. Most model merging techniques have hyperparameters, and their setting affects the performance of the merged model. Because several existing works show that tuning hyperparameters in model merging can enhance the merging outcome, developing hyperparameter optimization algorithms for model merging is a promising direction. However, its optimization process is computationally expensive, particularly in merging LLMs. In this work, we develop surrogate benchmarks for optimization of the merging hyperparameters to realize algorithm development and performance comparison at low cost. We define two search spaces and collect data samples to construct surrogate models to predict the performance of a merged model from a hyperparameter. We demonstrate that our benchmarks can predict the performance of merged models well and simulate optimization algorithm behaviors.

URL PDF HTML ☆

赞 0 踩 0

2509.22363 2026-06-18 cs.LG eess.AS 版本更新

Investigating Faithfulness in Large Audio Language Models

大型音频语言模型中的忠实性研究

Pooneh Mousavi, Lovenya Jain, Mirco Ravanelli, Cem Subakan

发表机构 * Concordia University（康科迪亚大学）； Mila - Quebec AI Institute（魁北克人工智能研究院）； Université Laval（拉瓦尔大学）； Birla Institute of Technology and Science, Pilani（比拉理工学院和科学学院，皮兰尼）

AI总结提出系统框架评估大型音频语言模型在推理链忠实性上的表现，定义三个音频忠实性标准，并通过基准测试发现模型推理与音频输入存在脱节。

Comments Accepted to Interspeech 2026

详情

AI中文摘要

大型音频语言模型（LALMs）将音频编码器与预训练的大型语言模型集成，以执行复杂的多模态推理任务。虽然这些模型可以生成思维链（CoT）解释，但这些推理链的忠实性仍不清楚。在这项工作中，我们提出了一个系统框架来评估LALMs中CoT在输入音频和最终模型预测方面的忠实性。我们定义了音频忠实性的三个标准：无幻觉、整体性和专注聆听。我们还引入了一个基于音频和CoT干预的基准来评估忠实性\footnote{基准测试界面和评估结果可在以下网址获取：https://this https URL。}。在Audio Flamingo 3和Qwen2.5-Omni上的实验表明存在潜在的多模态脱节：推理通常与最终预测一致，但并不总是强烈基于音频，并且可能容易受到幻觉或对抗性扰动的影响。

英文摘要

Large Audio Language Models (LALMs) integrate audio encoders with pretrained Large Language Models to perform complex multimodal reasoning tasks. While these models can generate Chain-of-Thought (CoT) explanations, the faithfulness of these reasoning chains remains unclear. In this work, we propose a systematic framework to evaluate CoT faithfulness in LALMs with respect to both the input audio and the final model prediction. We define three criteria for audio faithfulness: hallucination-free, holistic, and attentive listening. We also introduce a benchmark based on both audio and CoT interventions to assess faithfulness\footnote{The benchmarking interface and evaluation results are available at https://poonehmousavi.github.io/faithfulness/. Experiments on Audio Flamingo 3 and Qwen2.5-Omni suggest a potential multimodal disconnect: reasoning often aligns with the final prediction but is not always strongly grounded in the audio and can be vulnerable to hallucinations or adversarial perturbations.

URL PDF HTML ☆

赞 0 踩 0

2605.07022 2026-06-18 cs.LG 版本更新

Self-Driving Datasets: From 20 Million Papers to Nuanced Biomedical Knowledge at Scale

自主驾驶数据集：从2000万篇论文到大规模精细化生物医学知识

Haydn Jones, Yimeng Zeng, Alden Rose, Li S. Yifei, Yining Huang, Kaiwen Wu, Jiaming Liang, Maggie Ziyu Huan, Yoseph Barash, Cesar de la Fuente-Nunez, Osbert Bastani, Zachary Ives, Mark Yatskar, Jacob R. Gardner

发表机构 * Department of Computer and Information Science, University of Pennsylvania（宾夕法尼亚大学计算机与信息科学系）； Department of Genetics, University of Pennsylvania（宾夕法尼亚大学遗传学系）； Departments of Bioengineering and Chemical and Biomolecular Engineering, University of Pennsylvania（宾夕法尼亚大学生物工程与化学与生物分子工程系）

AI总结本文提出通过PubMed自动生成结构化数据集，实现更大规模、更精细和更准确的生物医学知识，展示Starling系统在多个任务中生成大规模数据集并提升准确性。

详情

AI中文摘要

人工编纂的生物医学仓库在生物活性、基因组学和化学领域昂贵且滞后于原始文献，丢弃实验背景，掩盖了评估数据正确性和覆盖范围所需的细微差别。我们证明PubMed本身可以被自动且经济地转化为结构化数据集，这些数据集比它们取代的编纂数据库更大、更细致和更准确。我们提出了三个耦合贡献：(1)基于九个生物医学本体的LLM实体标记流水线，能够在包含2250万篇论文和2500亿个token的PubMed语料库中标记45亿个实体，跨19个类别；(2)混合稀疏密集检索支持在标记语料库上执行实体过滤的语义查询；(3)Starling，一个多代理深度研究系统，仅给定自然语言任务描述，即可设计精度和召回率目标的检索过滤器，诱导提取模式，并输出具有丰富细节字段和支持段落的结构化记录。在六个任务中——血脑屏障渗透性、口服生物利用度、急性毒性（LD50）、基因疾病关联、蛋白质亚细胞定位和化学反应——Starling生成约630万条记录（每任务91K至3M条）；其中一些是目前最大的公开数据集。前沿模型对我们的提取的拒绝率在0.6-7.7%之间，远低于我们在广泛使用的编纂数据集上测量的错误率（例如，BBB_Martins为16.5%，Bioavailability_Ma为7.3%）。除了规模和准确性外，支持段落还携带了表格数据库所丢弃的细微差别——例如，口服生物利用度可能取决于进食与否的状态。共同，语料库、检索和代理为AI驱动的治疗设计建立了基础。代码和数据集：https://github.com/starling-labs/starling.

英文摘要

Manually curated biomedical repositories -- spanning bioactivity, genomics, and chemistry -- are expensive to maintain, lag behind primary literature, and discard experimental context, obscuring nuances needed to assess data correctness and coverage. We show that PubMed itself can be autonomously and cost-effectively turned into structured datasets that are larger, more nuanced, and more accurate than the curated databases they replace. We present three coupled contributions: (1) an LLM-based entity-tagging pipeline, grounded in nine biomedical ontologies, that tags 4.5B entities across 19 categories in a 22.5M-paper, 2.5T-token PubMed corpus; (2) hybrid sparse-dense retrieval supporting entity-filtered semantic queries over the tagged corpus; and (3) Starling, a multi-agent deep research system that, given only a natural-language task description, designs precision- and recall-targeted retrieval filters, induces an extraction schema, and emits structured records with nuance-rich fields and supporting passages. Across six tasks -- blood-brain barrier permeability, oral bioavailability, acute toxicity (LD50), gene-disease associations, protein subcellular localization, and chemical reactions -- Starling produces ~6.3M records (91K-3M per task); several are, to our knowledge, the largest public datasets for their property. Frontier-model rejection of our extractions is 0.6-7.7% across tasks, far below error rates we measure on widely used curated counterparts (e.g., 16.5% on BBB_Martins, 7.3% on Bioavailability_Ma). Beyond scale and accuracy, the supporting passages carry nuance tabular databases discard -- e.g., oral bioavailability may depend on fed vs. fasted state. Together, the corpus, retrieval, and agent establish a foundation for AI-driven therapeutic design. Code and datasets: https://github.com/starling-labs/starling.

URL PDF HTML ☆

赞 0 踩 0

2606.07591 2026-06-18 cs.LG cs.AI cs.CL 版本更新

ResearchClawBench: A Benchmark for End-to-End Autonomous Scientific Research

ResearchClawBench: 端到端自主科学研究基准

Wanghan Xu, Shuo Li, Tianlin Ye, Qinglong Cao, Yixin Chen, Hengjian Gao, Yiheng Wang, Qi Li, Kun Li, Sheng Xu, Shengdu Chai, Fangchen Yu, Xiangyu Zhao, Zhangrui Zhao, Weijie Ma, Zijie Guo, Koutian Wu, Haoyu Zhou, Haoxiang Yin, Lixue Cheng, Chaofan Hu, Haoxuan Li, Lu Mi, Xuxuan Xie, Yifan Zhou, Ruizhe Chen, Zhiwang Zhou, Xingjian Guo, Yuhao Zhou, Xuming He, Shengyuan Xu, Xinyu Gu, Jiamin Wu, Mianxin Liu, Chunfeng Song, Fenghua Ling, Dongzhan Zhou, Shixiang Tang, Yuqiang Li, Mao Su, Peng Ye, Siqi Sun, Bin Wang, Xue Yang, Zhenfei Yin, Tianfan Fu, Guangtao Zhai, Wanli Ouyang, Bo Zhang, Lei Bai, Wenlong Zhang

发表机构 * Shanghai Artificial Intelligence Laboratory（上海人工智能实验室）

AI总结提出ResearchClawBench基准，包含10个领域40个任务，通过多模态评分标准评估自主科研能力，最强智能体仅得21.5分，揭示当前系统在实验协议、证据匹配和科学核心方面的不足。

详情

AI中文摘要

AI编码智能体越来越多地用于科学工作，但其端到端自主研究能力仍然难以验证。我们提出了ResearchClawBench，一个用于评估自主科学研究的基准，涵盖来自10个科学领域的40个任务。每个任务基于一篇真实发表论文，提供相关文献和原始数据，并在评估期间隐藏目标论文。专家策划的多模态评分标准将目标科学制品分解为加权标准，从而能够评估目标论文级别的重新发现，同时为新发现留出空间。我们在统一协议下评估了七个自主研究（auto-research）智能体，并通过轻量级ResearchHarness评估了十七个原生LLM。当前系统远未达到可靠的重新发现：最强的自主智能体Claude Code平均得分为21.5，最强的ResearchHarness LLM Claude-Opus-4.7平均得分为20.7，LLM前沿均值仅为26.5。错误分析表明，失败集中在实验协议不匹配、证据不匹配和缺失科学核心。ResearchClawBench为衡量自主科学研究进展提供了一个可复现的评估前沿。

英文摘要

AI coding agents are increasingly used for scientific work, but their end-to-end autonomous research capability remains difficult to verify. We present ResearchClawBench, a benchmark for evaluating autonomous scientific research across 40 tasks from 10 scientific domains. Each task is grounded in a real published paper, provides related literature and raw data, and hides the target paper during evaluation. Expert-curated multimodal rubrics decompose the target scientific artifacts into weighted criteria, enabling evaluation of target-paper-level re-discovery while leaving room for new discovery. We evaluate seven autonomous research (auto-research) agents under a unified protocol and seventeen native LLMs through the lightweight ResearchHarness. Current systems remain far from reliable re-discovery: the strongest autonomous agent, Claude Code, averages 21.5, and the strongest ResearchHarness LLM, Claude-Opus-4.7, averages 20.7, with an LLM frontier mean of only 26.5. Error analysis shows that failures concentrate in experimental protocol mismatch, evidence mismatch, and missing scientific core. ResearchClawBench provides a reproducible evaluation frontier for measuring progress toward autonomous scientific research.

URL PDF HTML ☆

赞 0 踩 0

2407.18245 2026-06-18 cs.CV cs.LG 版本更新

TopBench：表格问答中隐式预测推理的基准

An-Yang Ji, Jun-Peng Jiang, De-Chuan Zhan, Han-Jia Ye

发表机构 * School of Artificial Intelligence, Nanjing University, China（人工智能学院，南京大学，中国）； National Key Laboratory for Novel Software Technology, Nanjing University, China（新型软件技术国家重点实验室，南京大学，中国）

AI总结提出TopBench基准，包含779个样本和四个子任务，评估大语言模型在表格问答中识别隐式预测意图并进行可靠推理的能力，发现当前模型在意图识别上存在困难。

详情

AI中文摘要

大型语言模型（LLM）推动了表格问答的发展，其中大多数查询可以通过提取信息或简单聚合来回答。然而，一类常见的现实世界查询是隐式预测性的，需要从历史模式中推断未观察到的答案，而不仅仅是检索。这些查询带来了两个挑战：识别潜在意图和对大规模表格进行可靠的预测推理。为了评估LLM在带有隐式预测任务的表格问答中的表现，我们引入了TopBench，一个包含779个样本的基准，涵盖四个子任务，从单点预测到决策制定、处理效应分析和复杂过滤，要求模型生成涵盖推理文本和结构化表格的输出。我们在基于文本和代理工作流下评估了多种模型。实验表明，当前模型通常在意图识别上存在困难，默认进行查找。更深入的分析发现，准确的意图消歧是引导这些预测行为的前提。此外，提升预测精度的上限需要整合更复杂的建模或推理能力。

英文摘要

Large Language Models (LLMs) have advanced Table Question Answering, where most queries can be answered by extracting information or simple aggregation. However, a common class of real-world queries is implicitly predictive, requiring the inference of unobserved answers from historical patterns rather than mere retrieval. These queries introduce two challenges: recognizing latent intent and reliable predictive reasoning over massive tables. To assess LLMs in such Tabular questiOn answering with implicit Prediction tasks, we introduce TopBench, a benchmark consisting of 779 samples across four sub-tasks, ranging from single-point prediction to decision making, treatment effect analysis, and complex filtering, requiring models to generate outputs spanning reasoning text and structured tables. We evaluate diverse models under both text-based and agentic workflows. Experiments reveal that current models often struggle with intent recognition, defaulting to just lookups. Deeper analysis identifies that accurate intent disambiguation serves as the prerequisite for leading these predictive behaviors. Furthermore, elevating the upper bound of prediction precision requires the integration of more sophisticated modeling or reasoning capabilities.

URL PDF HTML ☆

赞 0 踩 0

2605.03460 2026-06-18 cs.AI cs.LG 版本更新

FinSTaR: Towards Financial Reasoning with Time Series Reasoning Models

FinSTaR：面向时间序列推理模型的金融推理

Seunghan Lee, Jun Seo, Jaehoon Lee, Sungdong Yoo, Minjae Kim, Tae Yoon Lim, Dongwan Kang, Hwanil Choi, Soonyoung Lee, Wonbin Ahn

发表机构 * LG AI Research（LG人工智能研究）

AI总结针对时间序列推理模型在金融领域的失效问题，提出基于2x2能力分类法的FinSTaR模型，通过Compute-in-CoT和Scenario-Aware CoT策略在FinTSR-Bench基准上达到78.9%平均准确率。

Comments KDD Workshop on SciSoc Agents & LLMs 2026 (Oral Presentation)

详情

AI中文摘要

时间序列推理模型在通用领域表现出色，但在具有独特特征的金融领域却持续失败。我们提出一个通用的2x2能力分类法，通过交叉1)单实体与多实体分析，以及2)当前状态评估与未来行为预测来划分TSRM能力。我们在金融领域实例化该分类法——其中确定性评估与随机性预测的区分尤为关键——形成十个金融推理任务，并基于标普股票构建FinTSR-Bench基准。为此，我们提出FinSTaR（金融时间序列思考与推理），在FinTSR-Bench上训练，并针对每个类别采用不同的思维链策略。对于评估（确定性，即可从可观测数据计算得出），我们采用Compute-in-CoT，一种程序化思维链，使模型能够直接从原始价格推导答案。对于预测（本质上是随机的，即受不可观测因素影响），我们采用场景感知思维链，在做出判断前生成多种场景，模拟金融分析师在不确定性下的推理方式。所提方法在FinTSR-Bench上达到78.9%的平均准确率，显著优于LLM和TSRM基线。此外，我们展示了四个能力类别通过联合训练具有互补性和相互增强性，并且场景感知思维链相比标准思维链持续提升预测准确率。代码已公开：https://github.com/seunghan96/FinSTaR。

英文摘要

Time series (TS) reasoning models (TSRMs) have shown promising capabilities in general domains, yet they consistently fail in the financial domain, which exhibits unique characteristics. We propose a general 2 x 2 capability taxonomy for TSRMs by crossing 1) single-entity vs. multi-entity analysis with 2) assessment of the current state vs. prediction of future behavior. We instantiate this taxonomy in the financial domain-where the distinction between deterministic assessment and stochastic prediction is particularly critical-as ten financial reasoning tasks, forming the FinTSR-Bench benchmark based on S&P stocks. To this end, we propose FinSTaR (Financial Time Series Thinking and Reasoning), trained on FinTSR-Bench with distinct chain-of-thought (CoT) strategies tailored to each category. For assessment, which is deterministic (i.e., computable from observable data), we employ Compute-in-CoT, a programmatic CoT that enables models to derive answers directly from raw prices. For prediction, which is inherently stochastic (i.e., subject to unobservable factors), we adopt Scenario-Aware CoT, which generates diverse scenarios before making a judgment, mirroring how financial analysts reason under uncertainty. The proposed method achieves 78.9% average accuracy on FinTSR-Bench, substantially outperforming LLM and TSRM baselines. Furthermore, we show that the four capability categories are complementary and mutually reinforcing through joint training, and that Scenario-Aware CoT consistently improves prediction accuracy over standard CoT. Code is available at https://github.com/seunghan96/FinSTaR.

URL PDF HTML ☆

赞 0 踩 0

2606.16000 2026-06-18 cs.CL cs.LG 版本更新

GRACE-DS: a Guarded Reward-guided Agent Correction Environment in Data Science

GRACE-DS：数据科学中的受保护奖励引导智能体修正环境

Aleksandr Tsymbalov, Danis Zaripov, Artem Epifanov, Anastasiya Palienko

发表机构 * ITMO University（ITMO大学）； HSE University（高等经济学院）

AI总结提出GRACE-DS，一个用于评估LLM驱动的AutoML智能体在部署前性能的隔离环境，通过隐藏的可执行验证器衡量预测性能、泄漏避免、可重复性等指标，实验证明其灵活迭代交互模式优于基线方法。

详情

AI中文摘要

我们介绍了GRACE-DS，一个数据科学中的受保护奖励引导智能体修正环境，用于对LLM驱动的AutoML智能体进行部署前评估。GRACE-DS是一组在隔离环境中的评估指标，可应用于特定组织的表格ML任务。它将智能体暴露于现实的工作流阶段，从规划和数据检查到特征工程、模型开发、验证、代码修复直至最终提交，同时隐藏的可执行验证器不仅衡量最终预测性能，还衡量泄漏避免、可重复性、协议有效性、修正行为和奖励对齐。最强的结构化机制——灵活迭代交互（我们的方法）——实现了比单次生成、非结构化交互和基于重启的基线更高的端到端归一化隐藏测试质量，同时提高了协议有效完成率。经过7000多个回合的验证，这些结果确立了GRACE-DS作为评估基于LLM的AutoML智能体在生产类条件下按照组织特定要求执行机器学习工作流能力的稳健平台。

英文摘要

We introduce GRACE-DS, a Guarded Reward-guided Agent Correction Environment in Data Science for pre-deployment evaluation of LLM-powered AutoML agents. GRACE-DS is a set of evaluation metrics in an isolated environment that can be applied to tabular ML tasks specific to a particular organization. It exposes agents to realistic workflow stages, from planning and data inspection through feature engineering, model development, validation, and code repair to final submission, while hidden executable validators measure not only final predictive performance but also leakage avoidance, reproducibility, protocol validity, correction behavior, and reward alignment. The strongest structured regime, flexible iterative interaction (our approach), achieves higher end-to-end normalized hidden-test quality than single-shot generation, unstructured interaction, and restart-based baselines, while also improving protocol-valid completion. Validated across more than 7,000 episodes, these results establish GRACE-DS as a robust platform for assessing the capacity of LLM-based AutoML agents to execute machine learning workflows under production-like conditions and in accordance with organization-specific requirements.

URL PDF HTML ☆

赞 0 踩 0

2410.15595 2026-06-18 cs.AI cs.CL cs.LG 版本更新

A Comprehensive Survey of Direct Preference Optimization: Datasets, Theories, Variants, and Applications

直接偏好优化综述：数据集、理论、变体及应用

Wenyi Xiao, Zechuan Wang, Leilei Gan, Shuai Zhao, Zongrui Li, Ruirui Lei, Wanggui He, Luu Anh Tuan, Long Chen, Hao Jiang, Zhou Zhao, Fei Wu

发表机构 * Zhejiang University（浙江大学）； Nanyang Technological University（南洋理工大学）； Alibaba Group（阿里巴巴集团）

AI总结综述直接偏好优化（DPO）在理论、变体、数据集和应用方面的进展，指出其作为RL-free替代方案的潜力与局限，并提出未来研究方向。

Comments Accepted by TPAMI 2026. Project page: https://github.com/Mr-Loevan/DPO-Survey

详情

DOI: 10.1109/TPAMI.2026.3704314

AI中文摘要

随着大语言模型（LLMs）的快速发展，将策略模型与人类偏好对齐变得日益关键。直接偏好优化（DPO）作为一种有前景的对齐方法，作为从人类反馈中强化学习（RLHF）的无RL替代方案而出现。尽管DPO取得了各种进展并存在固有局限性，但文献中目前缺乏对这些方面的深入综述。在这项工作中，我们对DPO中的挑战和机遇进行了全面回顾，涵盖理论分析、变体、相关偏好数据集和应用。具体而言，我们基于关键研究问题对近期DPO研究进行分类，以提供对DPO当前格局的透彻理解。此外，我们提出了几个未来研究方向，为研究社区提供模型对齐的见解。相关论文的更新合集可在此https URL找到。

英文摘要

With the rapid advancement of large language models (LLMs), aligning policy models with human preferences has become increasingly critical. Direct Preference Optimization (DPO) has emerged as a promising approach for alignment, acting as an RL-free alternative to Reinforcement Learning from Human Feedback (RLHF). Despite DPO's various advancements and inherent limitations, an in-depth review of these aspects is currently lacking in the literature. In this work, we present a comprehensive review of the challenges and opportunities in DPO, covering theoretical analyses, variants, relevant preference datasets, and applications. Specifically, we categorize recent studies on DPO based on key research questions to provide a thorough understanding of DPO's current landscape. Additionally, we propose several future research directions to offer insights on model alignment for the research community. An updated collection of relevant papers can be found on https://github.com/Mr-Loevan/DPO-Survey.

URL PDF HTML ☆

赞 0 踩 0

2509.24725 2026-06-18 cs.LG cs.AI 版本更新

Q-Net: Queue Length Estimation via Kalman-based Neural Networks

Q-Net：基于卡尔曼神经网络的队列长度估计

Ting Gao, Elvin Isufi, Winnie Daamen, Erik-Sander Smits, Serge Hoogendoorn

发表机构 * University of Amsterdam（阿姆斯特丹大学）； Delft University of Technology（代尔夫特理工大学）

AI总结本文提出Q-Net框架，通过结合卡尔曼滤波与神经网络，解决信号交叉口队列长度估计中的数据融合问题，提升空间转移性和实时性，实现无需昂贵传感设备的准确队列估计。

详情

DOI: 10.1016/j.trc.2026.105809

AI中文摘要

估计信号交叉口的队列长度一直是交通管理中的长期挑战。尽管有两类隐私保护的数据源：(i) 接近停止线的环形检测器提供的车辆计数汇总数据，以及 (ii) 提供路段平均速度测量的汇总浮动汽车数据 (aFCD)，但如何将这些具有不同空间和时间分辨率的数据源整合用于队列长度估计仍不清楚。为此，本文提出Q-Net：一种基于状态空间形式的队列估计框架。该设计解决了队列建模中的关键挑战，如违反交通守恒假设。Q-Net遵循卡尔曼预测-更新结构，并在状态演变和测量模型中保持物理可解释性。Q-Net使用AI增强的卡尔曼滤波器从数据中学习时间变化的增益动态。该框架支持实时实现，并通过将aFCD测量分组为固定大小的局部组来提高空间转移性，使可学习参数的数量与路段长度无关。在荷兰 Rotterdam 城市主干道的评估显示，Q-Net优于基线方法，能够准确追踪队列的形成和消散，并缓解aFCD引起的延迟。通过结合数据效率、可解释性、实时适用性和空间转移性，Q-Net在无需昂贵的传感基础设施（如摄像头或雷达）的情况下实现了准确的队列长度估计。

英文摘要

Estimating queue lengths at signalized intersections is a long-standing challenge in traffic management. Partial observability of vehicle flows complicates this task despite the availability of two privacy-preserving data sources: (i) aggregated vehicle counts from loop detectors near stop lines, and (ii) aggregated floating car data (aFCD) that provide segment-wise average speed measurements. However, how to integrate these sources with differing spatial and temporal resolutions for queue length estimation is rather unclear. Addressing this question, we present Q-Net: a queue estimation framework built upon a state-space formulation. This design addresses key challenges in queue modeling, such as violations of traffic conservation assumptions. Q-Net follows the Kalman predict-update structure and maintains physical interpretability in both the state evolution and measurement models. Q-Net uses an AI-augmented Kalman filter to learn time-varying gain dynamics from data. The framework supports real-time implementation and improves spatial transferability by grouping aFCD measurements into fixed-size local groups, making the number of learnable parameters independent of section length. Evaluations on urban main roads in Rotterdam, the Netherlands, show that Q-Net outperforms baseline methods, tracks queue formation and dissipation accurately, and mitigates aFCD-induced delays. By combining data efficiency, interpretability, real-time applicability, and spatial transferability, Q-Net makes accurate queue length estimation possible without costly sensing infrastructure like cameras or radar.

URL PDF HTML ☆

赞 0 踩 0

2307.05623 2026-06-18 cs.LG cs.AI 版本更新

A DeepLearning Framework for Dynamic Estimation of Origin-Destination Sequence

一种用于动态估计起点-终点序列的深度学习框架

Zheli Xiong, Defu Lian, Enhong Chen, Gang Chen, Xiaomin Cheng

发表机构 * School of Data Science University of Science（数据科学学院中国科学技术大学）； Yangtze River Delta Information Intelligence Innovation Research Institute, China（长江三角洲信息智能创新研究院）

AI总结针对OD矩阵估计中的欠定性和滞后性问题，提出集成深度学习方法，利用神经网络推断OD序列结构并引导数值优化，实验证明能有效提供时空约束。

Comments 11 pages,25 figures

详情

AI中文摘要

OD矩阵估计是交通领域的一个关键问题。主要方法利用交通传感器测量信息（如交通计数）来估计由OD矩阵表示的交通需求。该问题分为两类：静态OD矩阵估计和动态OD矩阵序列（简称OD序列）估计。上述两类都面临由大量待估参数和不足的约束信息引起的欠定性问题。此外，OD序列估计还面临滞后挑战：由于拥堵等不同交通状况，同一车辆在相同观测时段内会出现在不同路段，导致相同的OD需求对应不同的行程。为此，本文提出一种集成方法，利用深度学习方法推断OD序列的结构，并利用结构约束指导传统数值优化。实验表明，神经网络能有效推断OD序列的结构，并为数值优化提供实用的约束以获得更好的结果。此外，实验表明，所提供的结构信息不仅包含对OD矩阵空间结构的约束，还提供了对OD序列时间结构的约束，很好地解决了滞后问题的影响。

英文摘要

OD matrix estimation is a critical problem in the transportation domain. The principle method uses the traffic sensor measured information such as traffic counts to estimate the traffic demand represented by the OD matrix. The problem is divided into two categories: static OD matrix estimation and dynamic OD matrices sequence(OD sequence for short) estimation. The above two face the underdetermination problem caused by abundant estimated parameters and insufficient constraint information. In addition, OD sequence estimation also faces the lag challenge: due to different traffic conditions such as congestion, identical vehicle will appear on different road sections during the same observation period, resulting in identical OD demands correspond to different trips. To this end, this paper proposes an integrated method, which uses deep learning methods to infer the structure of OD sequence and uses structural constraints to guide traditional numerical optimization. Our experiments show that the neural network(NN) can effectively infer the structure of the OD sequence and provide practical constraints for numerical optimization to obtain better results. Moreover, the experiments show that provided structural information contains not only constraints on the spatial structure of OD matrices but also provides constraints on the temporal structure of OD sequence, which solve the effect of the lagging problem well.

URL PDF HTML ☆

赞 0 踩 0

2506.13196 2026-06-18 cs.LG 版本更新

KEPLA: A Knowledge-Enhanced Deep Learning Framework for Accurate Protein-Ligand Binding Affinity Prediction

KEPLA：一种用于精确预测蛋白质-配体结合亲和力的知识增强深度学习框架

Han Liu, Keyan Ding, Peilin Chen, Yinwei Wei, Liqiang Nie, Dapeng Wu, Shiqi Wang

发表机构 * Department of Computer Science, City University of Hong Kong（香港城市大学计算机科学系）； ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University（浙江大学杭州国际科技创新中心）； School of Software, Shandong University（山东大学软件学院）； College of Informatics, Harbin Institute of Technology (Shenzhen)（哈尔滨工业大学（深圳）计算机学院）

AI总结提出KEPLA框架，通过整合基因本体和配体属性的先验知识，利用全局表示对齐与局部交叉注意力，提升蛋白质-配体结合亲和力预测的准确性，在多个基准数据集上超越现有方法。

详情

AI中文摘要

准确预测蛋白质-配体结合亲和力对药物发现至关重要。尽管最近的深度学习方法已展现出有希望的结果，但它们通常仅依赖蛋白质和配体的结构特征，忽略了与结合亲和力相关的宝贵生化知识。为解决这一局限，我们提出KEPLA，一种新颖的深度学习框架，明确整合来自基因本体和配体属性的先验知识以增强预测性能。KEPLA以蛋白质序列和配体分子图作为输入，并优化两个互补目标：（1）将全局表示与知识图谱关系对齐，以捕获领域特定的生化见解；（2）利用局部表示之间的交叉注意力构建细粒度联合嵌入用于预测。在两个基准数据集上的域内和跨域场景实验表明，KEPLA始终优于最先进的基线方法。此外，基于知识图谱关系和交叉注意力图的可解释性分析为潜在的预测机制提供了有价值的见解。

英文摘要

Accurate prediction of protein-ligand binding affinity is critical for drug discovery. While recent deep learning approaches have demonstrated promising results, they often rely solely on structural features of proteins and ligands, overlooking their valuable biochemical knowledge associated with binding affinity. To address this limitation, we propose KEPLA, a novel deep learning framework that explicitly integrates prior knowledge from Gene Ontology and ligand properties to enhance prediction performance. KEPLA takes protein sequences and ligand molecular graphs as input and optimizes two complementary objectives: (1) aligning global representations with knowledge graph relations to capture domain-specific biochemical insights, and (2) leveraging cross attention between local representations to construct fine-grained joint embeddings for prediction. Experiments on two benchmark datasets across both in-domain and cross-domain scenarios demonstrate that KEPLA consistently outperforms state-of-the-art baselines. Furthermore, interpretability analyses based on knowledge graph relations and cross attention maps provide valuable insights into the underlying predictive mechanisms.

URL PDF HTML ☆

赞 0 踩 0

2508.09191 2026-06-18 cs.LG cs.AI 版本更新

From Values to Tokens: An LLM-Driven Framework for Context-aware Time Series Forecasting via Symbolic Discretization

从数值到标记：一种基于符号离散化的LLM驱动上下文感知时间序列预测框架

Xiaoyu Tao, Shilong Zhang, Mingyue Cheng, Daoyu Wang, Tingyue Pan, Bokai Pan, Changqing Zhang, Shijin Wang

发表机构 * State Key Laboratory of Cognitive Intelligence（认知智能国家重点实验室）； University of Science and Technology of China（中国科学技术大学）； College of Intelligence and Computing（智能科学与计算学院）； iFLYTEK Research（iFLYTEK研究院）

AI总结提出TokenCast框架，利用大语言模型通过符号离散化将连续时间序列转化为标记，与上下文文本对齐，实现上下文感知的预测，实验证明有效。

详情

AI中文摘要

时间序列预测在能源、医疗和金融等关键应用领域支持决策中起着重要作用。尽管近期取得了进展，但由于将历史数值序列与通常包含非结构化文本数据的上下文特征整合的挑战，预测精度仍然有限。为了解决这一挑战，我们提出了TokenCast，一个由大语言模型（LLM）驱动的框架，利用基于语言的符号表示作为上下文感知时间序列预测的统一中介。具体来说，TokenCast采用离散分词器将连续数值序列转化为时间标记，实现与基于语言输入的结构对齐。为了有效弥合模态之间的语义差距，时间和上下文标记通过预训练的LLM嵌入到共享表示空间中，并通过生成目标进一步优化。基于这一统一语义空间，对齐的LLM随后以监督方式进行微调，以预测未来的时间标记，然后解码回原始数值空间。在真实世界数据集上的大量实验证明了我们框架的有效性，并突显了其作为上下文感知时间序列预测生成框架的潜力。代码可从此https URL获取。

英文摘要

Time series forecasting plays a vital role in supporting decision-making across a wide range of critical applications, including energy, healthcare, and finance. Despite recent advances, forecasting accuracy remains limited due to the challenge of integrating historical numerical sequences with contextual features, which often comprise unstructured textual data. To address this challenge, we propose TokenCast, a large language model (LLM) driven framework that leverages language-based symbolic representations as a unified intermediary for context-aware time series forecasting. Specifically, TokenCast employs a discrete tokenizer to transform continuous numerical sequences into temporal tokens, enabling structural alignment with language-based inputs. To effectively bridge the semantic gap between modalities, both temporal and contextual tokens are embedded into a shared representation space via a pre-trained LLM, further optimized with generative objectives. Building upon this unified semantic space, the aligned LLM is subsequently fine-tuned in a supervised manner to predict future temporal tokens, which are then decoded back into the original numerical space. Extensive experiments on real-world datasets demonstrate the effectiveness of our framework and highlight its potential as a generative framework for context-aware time series forecasting. The code is available at https://github.com/Xiaoyu-Tao/TokenCast.

URL PDF HTML ☆

赞 0 踩 0

2511.05221 2026-06-18 cs.LG q-bio.NC 版本更新

ActiTect: A Generalizable Machine Learning Pipeline for REM Sleep Behavior Disorder Screening through Standardized Actigraphy

ActiTect：通过标准化体动记录进行REM睡眠行为障碍筛查的通用机器学习流程

David Bertram, Anja Ophey, Sinah Röttgen, Konstantin Kufer, Gereon R. Fink, Elke Kalbe, Clint Hansen, Walter Maetzler, Maximilian Kapsecker, Lara M. Reimer, Stephan Jonas, Andreas T. Damgaard, Natasha B. Bertelsen, Casper Skjaerbaek, Per Borghammer, Karolien Groenewald, Pietro-Luca Ratti, Michele T. Hu, Noémie Moreau, Michael Sommerauer, Katarzyna Bozek

发表机构 * Faculty of Mathematics and Natural Sciences, University of Cologne, Germany（科隆大学数学与自然科学学院，德国）； Institute for Biomedical Informatics, Faculty of Medicine and University Hospital Cologne, University of Cologne, Germany（科隆大学医学院与科隆大学医院生物医学信息学研究所，德国）； Center for Molecular Medicine Cologne (CMMC), Faculty of Medicine and University Hospital Cologne, University of Cologne, Germany（科隆分子医学中心（CMMC），科隆大学医学院与科隆大学医院，德国）； Medical Psychology | Neuropsychology and Gender Studies, Faculty of Medicine and University Hospital Cologne, University of Cologne, Germany（科隆大学医学院与科隆大学医院医学心理学 | 神经心理学与性别研究，德国）； Cognitive Neuroscience, Insitute for Neuroscience and Medicine, INM-3, Research Center Juelich, Germany（认知神经科学，神经科学与医学研究所，Juelich研究中心，德国）； Department of Neurology, Faculty of Medicine and University Hospital Cologne, University of Cologne, Germany（科隆大学医学院与科隆大学医院神经科，德国）； Center of Neurology, Department of Parkinson, Sleep and Movement Disorders, University Hospital Bonn, University of Bonn, Germany（神经科中心，帕金森、睡眠与运动障碍部门，波恩大学医院，德国）； German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany（德国神经退行性疾病研究中心（DZNE），波恩，德国）； Cluster of Excellence for Aging and Aging-Associated Diseases (CECAD), University of Cologne, Germany（老龄化与相关疾病卓越中心（CECAD），科隆大学，德国）； Department of Neurology, University Medical Center Schleswig-Holstein, Campus Kiel and Kiel University, Germany（神经科，施普伦德-霍斯特大学医院，基尔校区和基尔大学，德国）； Department of Informatics, Technical University of Munich, Germany（信息学院，慕尼黑技术大学，德国）； Institute for Digital Medicine, University Hospital Bonn, Germany（数字医学研究所，波恩大学医院，德国）； Lundbeck Foundation Parkinson’s Disease Research Center (PACE), Aarhus University, Denmark（路德维希基金会帕金森病研究中心（PACE），奥胡斯大学，丹麦）； Department of Nuclear Medicine, Aarhus University Hospital, Denmark（核医学部，奥胡斯大学医院，丹麦）； Department of Electrical and Computer Engineering, Aarhus University, Denmark（电气与计算机工程系，奥胡斯大学，丹麦）； Oxford Parkinson’s Disease Centre and Division of Neurology, Nuffield Department of Clinical Neurosciences, University of Oxford, UK（牛津帕金森病中心与神经科，牛津大学临床神经科学系，英国）

AI总结提出ActiTect，一个全自动开源机器学习工具，通过标准化预处理和睡眠-觉醒检测，从体动记录中识别RBD，在多个独立队列中验证了泛化能力（AUROC 0.84-0.94）。

Comments 37 pages including Supplementary Information, 4 core figures, 1 supplementary figure. (v2: fixed a typo in Table 3 and made minor text edits; v3: post review)

详情

DOI: 10.1038/s41746-026-02738-8
Journal ref: npj Digital Medicine (2026)

AI中文摘要

孤立性快速眼动睡眠行为障碍（iRBD）是α-突触核蛋白病的主要前驱标志，通常先于帕金森病、路易体痴呆或多系统萎缩的临床发作。虽然腕戴式体动记录仪通过捕捉异常夜间运动在大规模筛查中具有检测RBD的巨大潜力，但缺乏可靠高效的分析流程则无法使用。本研究提出了ActiTect，一个全自动开源机器学习工具，用于从体动记录中识别RBD。为确保跨异构采集设置的泛化能力，我们的流程包括稳健的预处理和自动睡眠-觉醒检测，以协调多设备数据并提取表征活动模式的生理可解释运动特征。模型开发基于78名个体的队列，在嵌套交叉验证下表现出强大的区分能力（AUROC = 0.95）。在盲法本地测试集（n = 31，AUROC = 0.86）和两个独立外部队列（n = 113，AUROC = 0.84；n = 57，AUROC = 0.94）上验证了泛化性。为评估现实世界鲁棒性，跨内部和外部队列的留一数据集交叉验证显示出一致的性能（AUROC范围 = 0.84-0.89）。补充稳定性分析表明，关键预测特征在数据集中保持可重复性，支持最终合并的多中心模型作为更广泛部署的稳健预训练资源。通过开源且易于使用，我们的工具促进了广泛采用，并促进了独立验证和协作改进，从而推动该领域向使用可穿戴设备的统一且可泛化的RBD检测模型发展。

英文摘要

Isolated rapid eye movement sleep behavior disorder (iRBD) is a major prodromal marker of $α$-synucleinopathies, often preceding the clinical onset of Parkinson's disease, dementia with Lewy bodies, or multiple system atrophy. While wrist-worn actimeters hold significant potential for detecting RBD in large-scale screening efforts by capturing abnormal nocturnal movements, they become inoperable without a reliable and efficient analysis pipeline. This study presents ActiTect, a fully automated, open-source machine learning tool to identify RBD from actigraphy recordings. To ensure generalizability across heterogeneous acquisition settings, our pipeline includes robust preprocessing and automated sleep-wake detection to harmonize multi-device data and extract physiologically interpretable motion features characterizing activity patterns. Model development was conducted on a cohort of 78 individuals, yielding strong discrimination under nested cross-validation (AUROC = 0.95). Generalization was confirmed on a blinded local test set (n = 31, AUROC = 0.86) and on two independent external cohorts (n = 113, AUROC = 0.84; n = 57, AUROC = 0.94). To assess real-world robustness, leave-one-dataset-out cross-validation across the internal and external cohorts demonstrated consistent performance (AUROC range = 0.84-0.89). A complementary stability analysis showed that key predictive features remained reproducible across datasets, supporting the final pooled multi-center model as a robust pre-trained resource for broader deployment. By being open-source and easy to use, our tool promotes widespread adoption and facilitates independent validation and collaborative improvements, thereby advancing the field toward a unified and generalizable RBD detection model using wearable devices.

URL PDF HTML ☆

赞 0 踩 0

2602.19591 2026-06-18 cs.LG cs.AI 版本更新

Detecting High-Potential SMEs with Heterogeneous Graph Neural Networks

使用异构图神经网络检测高潜力中小企业

Yijiashun Qi, Hanzhe Guo, Yijiazhen Qi

发表机构 * University of Michigan（密歇根大学）； The University of Hong Kong（香港大学）

AI总结提出SME-HGT异构图Transformer框架，利用公开数据构建包含公司、研究主题和政府机构的异构图，预测SBIR第一阶段获奖者能否进入第二阶段，AUPRC达0.621，优于基线模型。

Comments accepted by (ICIIS 2026)

详情

AI中文摘要

中小企业占美国企业的99.9%，贡献44%的经济活动，但系统性地识别高潜力中小企业仍是一个开放挑战。我们提出了SME-HGT，一个异构图Transformer框架，仅使用公开数据预测哪些SBIR第一阶段获奖者将进入第二阶段资助。我们构建了一个异构图，包含32,268个公司节点、124个研究主题节点和13个政府机构节点，通过约99,000条边连接三种语义关系类型。SME-HGT在时间分割测试集上达到0.621±0.003的AUPRC，在五个随机种子上优于MLP基线（0.590±0.002）和R-GCN（0.608±0.013）。在筛选深度为100家公司时，SME-HGT达到89.6%的精确率，比随机选择提升2.14倍。我们的时间评估协议防止信息泄露，对公开数据的依赖确保了可重复性。这些结果表明，公司、研究主题和资助机构之间的关系结构为中小企业潜力评估提供了有意义的信号，对政策制定者和早期投资者具有启示意义。

英文摘要

Small and Medium Enterprises (SMEs) constitute 99.9% of U.S. businesses and generate 44% of economic activity, yet systematically identifying high-potential SMEs remains an open challenge. We introduce SME-HGT, a Heterogeneous Graph Transformer framework that predicts which SBIR Phase I awardees will advance to Phase II funding using exclusively public data. We construct a heterogeneous graph with 32,268 company nodes, 124 research topic nodes, and 13 government agency nodes connected by approximately 99,000 edges across three semantic relation types. SME-HGT achieves an AUPRC of 0.621 0.003 on a temporally-split test set, outperforming an MLP baseline (0.590 0.002) and R-GCN (0.608 0.013) across five random seeds. At a screening depth of 100 companies, SME-HGT attains 89.6% precision with a 2.14 lift over random selection. Our temporal evaluation protocol prevents information leakage, and our reliance on public data ensures reproducibility. These results demonstrate that relational structure among firms, research topics, and funding agencies provides meaningful signal for SME potential assessment, with implications for policymakers and early-stage investors.

URL PDF HTML ☆

赞 0 踩 0

2605.10083 2026-06-18 cs.LG 版本更新

Unlocking air traffic flow prediction through microscopic aircraft-state modeling

通过微观飞机状态建模解锁空交通流量预测

Bin Wang, Anqi Liu, Jiangtao Zhao, Hina Birahmani, Yanyong Huang, Peilan He, Guiyuan Jiang, Feng Hong, Yanwei Yu, Yuanyuan Hou, Tianrui Li

发表机构 * Faculty of Information Science and Engineering（信息科学与工程学院）； Ocean University of China（中国海洋大学）； Sanya Oceanographic Institution（三亚海洋研究所）； Joint Laboratory of Data Science and Business Intelligence（数据科学与商务智能联合实验室）； Southwestern University of Finance and Economics（西南财经大学）； The Affiliated Hospital of Qingdao University（青岛大学附属医院）； School of Computing and Artificial Intelligence（计算机与人工智能学院）

AI总结本文提出AeroSense模型，通过微观飞机状态直接预测未来区域交通流量，提升高密度交通下的预测精度，替代传统时间序列方法。

详情

AI中文摘要

终端空域短期空交通流量预测对主动空交通管理至关重要。现有方法主要将交通流量建模为聚合时间序列，尽管交通动态由飞机状态和连续空域中的相互作用决定。此类聚合掩盖了包括飞机运动学、边界相互作用和控制意图在内的细粒度信息。本文提出AeroSense，一种从即时空域情况中的动态飞机状态集直接预测未来交通流量的状态到流量建模框架。通过建立从微观飞机状态到未来区域交通流量的端到端映射，AeroSense在保持飞机级动态的同时，自然适应变化的交通密度，而无需依赖历史回溯窗口。在大规模真实数据集上的实验表明，AeroSense在高密度交通期间比基于聚合的预测方法具有持续的预测精度提升。这些发现表明，即时空域情况为传统基于时间序列的交通预测范式提供了有效的替代方案。

英文摘要

Short-term air traffic flow prediction in terminal airspace is essential for proactive air traffic management. Existing approaches predominantly model traffic flow as aggregated time series. However, traffic dynamics are governed by aircraft states and their interactions in continuous airspace. Such aggregation obscures fine-grained information, including aircraft kinematics, boundary interactions, and control intent. Here we present AeroSense, a state-to-flow modeling paradigm that predicts future traffic flow directly from instantaneous airspace situations represented as dynamic sets of aircraft states derived from ADS-B trajectories. By establishing an end-to-end mapping from microscopic aircraft states to future regional traffic flow, AeroSense preserves aircraft-level dynamics while naturally accommodating varying traffic density without relying on historical look-back windows. Experiments on a large-scale real-world dataset show that AeroSense exhibits admirable predictive accuracy and robustness over aggregation-based forecasting approaches, particularly during high-density traffic periods. These findings suggest that aircraft-state situation modeling provides a promising alternative to conventional time-series forecasting in air traffic flow management.

URL PDF HTML ☆

赞 0 踩 0

2605.13566 2026-06-18 cs.LG 版本更新

Spatiotemporal downscaling and nowcasting of urban land surface temperatures with deep neural networks

基于深度神经网络的城市地表温度时空下垫面精细化与现在预报

Solomiia Kurchaba, Angela Meyer

发表机构 * Department of Geoscience and Remote Sensing（地质科学与遥感系）； Delft University of Technology（代尔夫特理工大学）； School of Engineering and Computer Science（工程与计算机科学学院）； Bern University of Applied Sciences（伯恩应用科学大学）

AI总结本文提出利用深度神经网络结合静止和极轨卫星数据，实现高时空分辨率的城市地表温度场估计与现在预报，提升城市气候与生态研究的精度与时效性。

Comments Paper after publication in IEEE Access

详情

DOI: 10.1109/ACCESS.2026.3700054
Journal ref: IEEE Access, vol. 14, pp. 85134-85151, 2026

AI中文摘要

地表温度（LST）是多种应用的关键变量，如城市气候和生态研究。然而，现有卫星衍生的LST产品提供的是高空间或高时间分辨率，导致两者之间存在根本性权衡。为解决这一权衡，我们结合静止和极轨卫星的观测数据，提供高空间和高时间分辨率（1公里，15分钟间隔）的LST场。我们展示了其在日内LST预报中的应用。为了估计高时空分辨率的LST场，训练了一个U-Net模型，将SEVIRI/MSG（3公里，15分钟分辨率）的LST场映射到Terra/Aqua MODIS（1公里，每天4次过境）的LST场，二者在空间和时间上同步。所提出的模型已在欧洲大都市的LST上进行训练，人口超过100万，且在留出测试集上达到RMSE=1.92°C和接近零偏移MVE=0.01°C。作为第二步，我们提出基于ConvLSTM架构的LST现在预报模型，训练数据为下缩的LST场，预测时间跨度为15至75分钟。该现在预报模型优于持续性和气候滚动中位数基准，对于所考虑的预测时间，RMSE为0.57至1.15°C，偏移范围从-0.1到0.14°C。此外，与独立MODIS过境的额外验证确认了鲁棒性能。我们的高时空分辨率LST预报模型可直接应用于基于卫星的LST监测操作。

英文摘要

Land Surface Temperature (LST) is a key variable for various applications, such as urban climate and ecology studies. Yet, existing satellite-derived LST products provide either high spatial or high temporal resolution, resulting in a fundamental trade-off between the two. To address this trade-off, we combine observations from a geostationary and a polar orbiting satellite and provide LST fields at high spatial and high temporal resolution (1 km at 15-min intervals). We demonstrate their application for intraday forecasting of LSTs. To estimate LST fields at high spatiotemporal resolution, a U-Net model is trained to map LST fields from SEVIRI/MSG (3 km and 15 min resolution) to LST fields from Terra/Aqua MODIS (1 km, 4 overpasses per day) that are collocated in space and time. The presented model has been trained on LSTs across large European cities with a population exceeding 1 million inhabitants, and achieves an RMSE = $1.92$°C and near-zero bias MBE = $0.01$°C on the hold-out test set. As a second step, we present an LST nowcasting model based on ConvLSTM architecture, trained across downscaled LST fields with forecast lead times of 15 to 75 minutes. The nowcasting model outperforms a persistence and a Climatological Rolling Median benchmarks, with RMSEs of $0.57$ to $1.15$°C for the considered lead times and biases ranging from $-0.1$ to $0.14$°C. An additional validation conducted against independent MODIS overpasses confirms robust performance. Our LST forecast model at high spatiotemporal resolution is directly applicable to operational satellite-based LST monitoring.

URL PDF HTML ☆

赞 0 踩 0

2605.21528 2026-06-18 cs.LG cs.AI 版本更新

A Reproducible Log-Driven AutoML Framework for Interpretable Pipeline Optimization in Healthcare Risk Prediction

可重复的基于日志的自动机器学习框架用于医疗风险预测中的可解释流水线优化

Rui Huang, Lican Huang

发表机构 * School of Basic Medicine, Hangzhou Normal University（杭州师范大学基础医学院）； Research Department, Hangzhou Domain Zones Technology Co.Ltd.（杭州域区技术有限公司）

AI总结本文提出了一种可重复的基于日志的自动机器学习框架，用于医疗风险预测中的可解释流水线优化，通过分析组件属性、交互和冗余性，提高了模型性能和稳定性。

详情

AI中文摘要

准确且可重复的疾病风险预测仍然具有挑战性，由于异质特征、有限样本和严重的类别不平衡。本研究引入了yvsoucom-iterkit，一种确定性和基于日志的自动化机器学习框架，将流水线优化完全可重复地建模为配置级系统。每个流水线被编码为可追溯的日志实体，使能够分析组件属性、交互、相似性和跨种子鲁棒性。在超过18,000个流水线配置上对Pima Indians糖尿病和中风数据集的实验揭示了一个结构化且部分冗余的搜索空间，其中性能由一小部分相互作用的组件决定。随机森林重要性分析显示，增强（0.454）、模型选择（0.198）和不平衡处理（0.101）是Pima数据集的关键驱动因素，而不平衡处理主导中风（0.406）。组件相似性分析显示强冗余性，特征选择变体（biMax-biMean）表现出低RMS距离（0.0252），混合匹配无增强（0.0279），TomekLinks与无不平衡处理对齐（0.0325），而高斯噪声与无增强的差异更大（0.10）。该框架使用集成模型（加权F1 0.89，宏F1 0.88在Pima；加权F1 0.94在中风）实现了强且稳定的性能，而宏F1在中风上较低（0.67）由于类别不平衡。跨种子分析揭示了性能-鲁棒性权衡，集成模型的变异性低于SVM。这些结果表明，有效的AutoML优化可以聚焦于一组高影响的组件。

英文摘要

Accurate disease risk prediction is challenged by heterogeneous features, limited data, and class imbalance. This study presents yvsoucom-iterkit, a deterministic AutoML framework that models pipeline optimization as a configuration-level system with full reproducibility and traceable execution logs, enabling systematic analysis of component attribution, interactions, similarity, and cross-seed robustness. Experiments on the Pima Indians Diabetes and Stroke datasets across more than 18,000 pipeline configurations reveal a structured yet partially redundant search space, where performance is dominated by a small subset of interacting components. Ensemble models achieve stable performance, reaching a Weighted-F1 of 0.89 on Pima and 0.94 on Stroke. Macro-F1 reaches approximately 0.88 on Pima but drops to 0.6560 on Stroke due to severe imbalance. Cross-seed experiments show that ensembles reduce variance compared to single models. Friedman testing ($p < 0.05$) confirms significant ranking differences across configurations. Based on analysis of component attribution, interaction, and similarity, optimal configuration design reveals dataset-dependent behavior. For the Pima dataset, computational efficiency benefits from simplified search spaces where redundant components can be removed, with split ratio playing a key role. In contrast, the Stroke dataset requires enhanced imbalance-aware strategies, where RandomOverSampler improves Macro-F1 from 0.6560 to 0.6766. These findings demonstrate that effective AutoML optimization is achieved through optimal configuration design, where carefully constraining the search space to high-impact components can improve performance, stability, and interpretability while reducing unnecessary search complexity.

URL PDF HTML ☆

赞 0 踩 0

2606.07622 2026-06-18 cs.LG stat.AP 版本更新

Airport Terminal Passenger Queue Forecasting for Departure Gates and Security Checkpoints

机场航站楼登机口与安检点旅客排队预测

Juhwan Lee, Seokbin Yoon, Keumjin Lee, Hojong Baik, Seyeon Jung

发表机构 * Korea Aerospace University（韩国航空大学）； Korea Airports Corporation（韩国机场公社）

AI总结提出基于Transformer的框架，利用历史队列长度、等待时间和旅客吞吐量数据，预测登机口和安检点未来两小时的队列长度与等待时间，支持主动排队管理。

Comments 10 pages, 6 figures, accepted at DASC 2026

详情

AI中文摘要

准确的机场航站楼旅客排队预测对于高效的离港运营至关重要，因为它能够实现主动的拥堵管理。然而，时变的旅客需求以及多个离港设施中异构的设施使用情况使得预测具有挑战性。在这项工作中，我们提出了一种旅客排队预测框架，该框架从运营数据中学习历史旅客流量模式。所提出的模型采用基于Transformer的架构，利用过去登机口和安检点的队列长度和等待时间，以及值机岛的旅客吞吐量，来捕捉时间依赖性和设施间相关性。学习到的表示被映射到两个设施特定的MLP头部，以预测登机口和安检点的队列长度和等待时间。实验结果表明，该模型能够准确预测未来两小时内的排队情况。所提出的方法为机场航站楼运营中的主动排队管理和人员重新分配提供了实用的实时决策支持。

英文摘要

Accurate passenger queue forecasting in airport terminals is essential for efficient departure operations, as it enables proactive congestion management. However, time-varying passenger demand and heterogeneous facility usage across multiple departure facilities make forecasting challenging. In this work, we propose a passenger queue forecasting framework that learns historical passenger flow patterns from operational data. The proposed model employs a Transformer-based architecture to capture temporal dependencies and inter-facility correlations using past queue length and waiting time at departure gates and security checkpoints, together with passenger throughput at check-in islands. The learned representations are mapped to two facility-specific prediction heads to predict queue length and waiting time at departure gates and security checkpoints. Experimental results demonstrate accurate forecasts up to two hours ahead. The proposed approach offers practical real-time decision support for proactive queue management and staff reallocation in airport terminal operations.

URL PDF HTML ☆

赞 0 踩 0

2204.14224 2026-06-18 cs.CV cs.LG eess.IV 版本更新

Investigation of Neural Network Methods for Reconstruction and Classification of Texture Images Under Conditions of Incomplete Information

不完全信息条件下纹理图像重建与分类的神经网络方法研究

Galymzhan Abdimanap, Kairat Bostanbekov, Abdelrahman Abdallah, Anel Alimova, Darkhan Kurmangaliyev, Daniyar Nurseitov, Tatyana Dedova, Larissa Balakay, Serik Nurakynov

发表机构 * Satbayev University（萨特巴耶夫大学）； Institute of Ionosphere LLP（电离层研究所）； Information Technology Department（信息技术部门）； Assiut University（阿西乌特大学）

AI总结提出结合目标检测、GAN（CRA）修复和Transformer/CNN分类的端到端框架，发现重建质量高（PSNR 28.7dB）但分类准确率仅53%，通过置信度混合集成将MCA从48%提升至58%，揭示生成模型产生语义模糊特征的问题。

Comments IEEE ACCESS

详情

DOI: 10.1109/ACCESS.2026.3705029

AI中文摘要

异质自然纹理的自动化分析常因物理损伤和数据丢失而受阻，这对计算机视觉构成了重大挑战。虽然深度学习在受控环境中已显示出成功，但其在信息不完全条件下对复杂地质材料的应用仍未被充分探索。本研究提出了一个用于高分辨率岩心样本图像修复和分类的集成框架。我们设计了一个端到端流水线，利用目标检测进行样本分割，随后使用具有上下文残差聚合（CRA）的生成对抗网络（GAN）进行图像修复，以重建缺失的高频细节。接着，我们在重建数据上评估了现代基于Transformer（Swin、ViT）和CNN架构的性能。实验揭示了重建质量与下游效用之间的关键分歧：尽管结构保真度高（PSNR 28.7 dB，FID 74.01），分类准确率却停滞在53%。为了改善少数类检测，我们提出了一种基于置信度的混合集成方法，将MCA从48%提升至58%。这些结果凸显了当前最先进生成模型的局限性，它们可能产生视觉上合理但语义模糊的特征（“幻觉”），从而混淆分类器。本工作深入探讨了图像重建质量与分类性能之间的依赖关系，为无损检测和材料科学领域的未来研究提供了可复现的基线。鉴于井间准确率仍处于49-53%范围，我们将所得到的系统定位为岩相解释的决策支持和筛选工具，而非完全自主的分类器。代码可在以下网址获取：https://github.com/your-repo（注：原文URL未提供，此处为示例）

英文摘要

The automated analysis of heterogeneous natural textures is frequently hindered by physical damage and data loss, presenting a significant challenge to computer vision. While deep learning has shown success in controlled environments, its application to complex geological materials under conditions of incomplete information remains underexplored. This study presents an integrated framework for the inpainting and classification of high-resolution core sample images. We propose an end-to-end pipeline that utilizes object detection for sample segmentation, followed by image inpainting using Generative Adversarial Networks (GANs) with Contextual Residual Aggregation (CRA) to reconstruct missing high-frequency details. Subsequently, we evaluate the performance of modern Transformer-based (Swin, ViT) and CNN architectures on the reconstructed data. Our experiments revealed a critical divergence between reconstruction quality and downstream utility: despite high structural fidelity (PSNR 28.7~dB, FID 74.01), classification accuracy plateaued at 53\%. To improve minority-class detection, we propose a confidence-based hybrid ensemble that raises MCA from 48\% to 58\%. These results highlight the limitations of current state-of-the-art generative models, which may produce visually plausible but semantically ambiguous features ("hallucinations") that confound classifiers. This work provides insights into the dependencies between image reconstruction quality and classification performance, offering a reproducible baseline for future research in non-destructive testing and material science. Given that cross-well accuracy remains in the 49--53\% range, we position the resulting system as a decision-support and screening tool for lithofacies interpretation rather than as a fully autonomous classifier. The code is available at https://github.com/GalymzhanAbdimanap/Lithology_recognition

URL PDF HTML ☆

赞 0 踩 0

2508.10178 2026-06-18 q-bio.QM cs.LG 版本更新

Estimating carbon pools in the European Shelf sea environment: replacing reanalysis by model-informed machine learning?

估算欧洲陆架海环境中的碳库：用模型指导的机器学习替代再分析？

Jozef Skakala

发表机构 * Plymouth Marine Laboratory（普利茅斯海洋实验室）； National Centre for Earth Observation（国家地球观测中心）

AI总结提出用深度集成神经网络学习可观测变量与海洋碳库的关系，以低成本替代昂贵再分析，在西北欧陆架海实现高效碳库预测并提供不确定性。

Comments 37 pages, 9 figures (+ 3 in the appendix), v3 - published version

详情

DOI: 10.1029/2026JH001326
Journal ref: JGR - Machine Learning and Computation 3 (2026)

AI中文摘要

陆架海对经济和碳循环至关重要，但碳库观测往往稀疏或高度不确定。碳再分析（无论是同化叶绿素a等代理变量还是直接同化碳）可提供替代方案，但运行成本高昂。我们提出使用计算成本低的神经网络集成（即深度集成）来学习直接可观测（大气、河流和海洋）变量与海洋碳库之间的关系，该关系来自一个物理-生物地球化学耦合模型。深度集成在西北欧陆架海（NWES）物理-生物地球化学模型自由运行模拟上训练。训练后，使用来自NWES再分析的输入而非自由运行来运行深度集成，证明它能高效预测多个NWES碳库（如碎屑、浮游动物、异养细菌），且与再分析的一致性远优于自由运行，同时提供不确定性信息。我们进一步表明，当深度集成直接由同化到再分析中的观测驱动时，其表现同样良好，但碳库只能预测在观测位置和时间。我们关注结果的可解释性，并展示了深度集成在未来气候假设情景中的潜在应用。我们认为，模型指导的机器学习为昂贵的再分析提供了可行的替代方案，并可在观测缺失和/或高度不确定的地方补充观测。

英文摘要

Shelf seas are important for the economy and the carbon cycle, but shelf sea observations for carbon pools are often sparse, or highly uncertain. An alternative can be provided by carbon reanalyses (whether assimilating proxy variables, such as chlorophyll-$a$, or directly carbon), but these are often expensive to run. We propose to use a computationally cheap ensemble of neural networks (i.e. deep ensemble) to learn the relationship between the directly observable (atmospheric, riverine and ocean) variables and marine carbon pools from a coupled physics-biogeochemistry model. The deep ensemble was trained on a North-West European Shelf (NWES) physical-biogeochemistry model free run simulation. After training, the deep ensemble was run using inputs from the NWES reanalysis instead of the free run, demonstrating that it can efficiently predict several NWES carbon pools (e.g., detritus, zooplankton, heterotrophic bacteria) in much better agreement with the reanalysis than the free run, while also providing uncertainty information. We further show that the deep ensemble performs similarly well when it is driven directly by the observations assimilated into the reanalysis, with the limitation that carbon pools can then be predicted only at the observed locations and times. We focus on explainability of the results and demonstrate potential use of the deep ensembles for future climate what-if scenarios. We suggest that model-informed machine learning presents a viable alternative to expensive reanalyses and could complement observations, wherever they are missing and/or highly uncertain.

URL PDF HTML ☆

赞 0 踩 0

2511.00366 2026-06-18 stat.ML cs.CE cs.LG 版本更新

IPSL-AID：用于从全球到区域尺度气候降尺度的生成扩散模型

Kishanthan Kingston, Olivier Boucher, Freddy Bouchet, Pierre Chapel, Rosemary Eade, Jean-Francois Lamarque, Redouane Lguensat, Kazem Ardaneh

发表机构 * Climate Modeling Center（气候建模中心）； Sorbonne University（索邦大学）； CNRS（法国国家科学研究中心）； IPSL ； Paris（巴黎）； France（法国）

AI总结提出基于去噪扩散概率模型的IPSL-AID工具，利用ERA5再分析数据从粗分辨率输入生成0.25°温度、风和降水场，并建模细尺度特征概率分布以量化不确定性，准确重建统计分布、极端事件和空间结构。

Comments 17 pages, 12 figures, submitted to Climate Informatique 2026, to appear in Environmental Data Science

2604.14906 2026-06-18 physics.bio-ph cs.LG 版本更新

Unraveling the Mechanism of Drug Binding to SARS-CoV-2 RNA Pseudoknot with Thermodynamics-Driven Machine Learning

用热力学驱动的机器学习揭示药物与SARS-CoV-2 RNA假结的结合机制

Mariia Ivonina, Jakub Rydzewski

发表机构 * Platform of Inter/Transdisciplinary Energy Research, Kyushu University（interdisciplinary 能源研究平台，九州大学）； Institute of Physics, Faculty of Physics, Astronomy and Informatics, Nicolaus Copernicus University（物理研究所，物理、天文学与信息学学院，尼古拉库普林大学）

AI总结本研究利用热力学驱动的机器学习方法（光谱映射）从全原子分子动力学轨迹中学习集体变量，揭示了配体结合对SARS-CoV-2 RNA假结拓扑选择性去稳定化的机制，并发现质子化状态是模拟RNA靶向药物作用的关键因素。

详情

AI中文摘要

SARS-CoV-2 RNA中的假结二级结构通过$-1$程序性核糖体移码（$-1$ PRF）调控蛋白质合成，该机制使病毒能从重叠阅读框产生结构蛋白和非结构蛋白。该假结表现出穿线和非穿线两种长寿命拓扑结构。配体结合对其折叠的影响是开发$-1$ PRF小分子抑制剂的关键过程。通过引入捕捉相应最慢动力学模式的集体变量（CVs），可以促进通过无偏分子动力学（MD）模拟理解这一过程。这里，我们使用光谱映射（SM），一种热力学驱动的机器学习技术，直接从SARS-CoV-2 RNA假结与$-1$ PRF抑制剂莫拉沙星及其两种结构类似物（中性和离子化形式）复合物的全原子MD轨迹中学习这样的CVs。从学习到的CVs导出的自由能景观（FELs）表明，配体诱导的去稳定化是拓扑选择性的。在穿线假结中，抑制剂去稳定化S2茎，而在非穿线假结中，去稳定化发生在S1和S3茎。此外，每个配体重塑FEL的程度与实验报道的抗病毒效力相匹配，而质子化状态在相同RNA拓扑内定性地改变动力学。总体而言，我们的结果显示了假结拓扑、配体类型和质子化状态如何共同影响病毒RNA的慢构象动力学，并确立了生理质子化作为模拟RNA靶向药物作用的关键因素。

英文摘要

The pseudoknot secondary structure in SARS-CoV-2 RNA is essential for regulating protein synthesis through $-$1 programmed ribosomal frameshifting ($-1$ PRF), a mechanism that allows the virus to generate both structural and non-structural proteins from overlapping reading frames. This pseudoknot exhibits both threaded and unthreaded long-lived topologies. The influence of ligand binding on its folding is a process critical for the development of $-$1 PRF small-molecule inhibitors. Understanding this process through unbiased molecular dynamics (MD) simulations can be facilitated by introducing collective variables (CVs) that capture the corresponding slowest dynamical modes. Here, we use spectral map (SM), a thermodynamics-driven machine learning technique, to learn such CVs directly from all-atom MD trajectories of the SARS-CoV-2 RNA pseudoknot in complex with the $-$1 PRF inhibitor merafloxacin and its two structural analogs in neutral and ionized forms. Free-energy landscapes (FELs) derived from the learned CVs indicate that ligand-induced destabilization is topology-selective. In the threaded pseudoknot, the inhibitors destabilize the S2 stem, while in the unthreaded pseudoknot, destabilization occurs in the S1 and S3 stems. Furthermore, the extent to which each ligand reshapes the FEL matches experimentally reported antiviral potency, whereas the protonation state qualitatively alters dynamics within the same RNA topology. Overall, our results show how pseudoknot topology, ligand type, and protonation state collectively influence the slow conformational dynamics of viral RNA and establish physiological protonation as a critical factor for modeling RNA-targeted drug action.

URL PDF HTML ☆

赞 0 踩 0

2604.22476 2026-06-18 cs.CV cs.LG 版本更新

All Eyes on the Workflow: Automated and Efficient Event Discovery from Video Streams

全神贯注于工作流：从视频流中自动高效发现事件

Marco Pegoraro, Jonas Seng, Dustin Heller, Wil M. P. van der Aalst, Kristian Kersting

发表机构 * Chair of Process and Data Science, RWTH Aachen University（过程与数据科学教授席位，亚琛工业大学）； Artificial Intelligence & Machine Learning Lab, Technical University of Darmstadt（人工智能与机器学习实验室，达姆施塔特技术大学）

AI总结提出SnapLog方法，利用图像嵌入和帧间相似矩阵进行时间分割，结合广义少样本分类从视频中提取事件数据，生成可解释的带标签时间戳帧序列。

Comments 18 pages, 6 figures, 1 table, 27 references

详情

AI中文摘要

业务流程管理和流程挖掘等学科通过基于记录的事件数据发现流程见解来帮助组织。然而，流程分析的一个障碍是数据多模态性：例如，视频形式的数据不能直接解释为事件。现有方法依赖于活动标签字典作为输入，无法提供逐帧标签解释，或依赖于过时的计算机视觉技术。在这项工作中，我们提出了SnapLog，一种通过使用图像嵌入将帧转换为特征向量，并通过帧间相似矩阵进行时间分割来从视频中提取事件数据的方法。然后使用广义少样本分类为视频片段分配标签，生成可解释为事件的带标签、时间戳的子帧序列。传统的流程挖掘技术可用于分析结果数据。我们表明，我们的方法生成的日志准确反映了视频中的流程。

英文摘要

Disciplines such as business process management and process mining aid organizations by discovering insights about processes on the basis of recorded event data. However, an obstacle to process analysis is data multi-modality: for instance, data in video form are not directly interpretable as events. Existing approaches rely on a dictionary of activity label as input, cannot provide frame-by-frame labeling explanations, or rely on superseded computer vision techniques. In this work, we present SnapLog, an approach to extract event data from videos by converting frames to feature vectors using image embeddings and performing temporal segmentation through frame-wise similarity matrices. A generalized few-shot classification is then used to assign labels to the video segments, yielding labeled, timestamped sub-sequences of frames that are interpretable as events. Conventional process mining techniques can be used to analyze the resulting data. We show that our approach produces logs that accurately reflect the process in the videos.

URL PDF HTML ☆

赞 0 踩 0

2605.22845 2026-06-18 cs.CE cs.LG 版本更新

Adv-TGD：面向人脸识别冒充攻击的对抗性文本引导扩散

Omid Ahmadieh, Nima Karimian

发表机构 * University of South Florida, Bellini College of Artificial Intelligence, Cybersecurity and Computing（南佛罗里达大学贝利尼人工智能、网络安全与计算学院）

AI总结提出Adv-TGD框架，利用Stable Diffusion和LoRA微调生成逼真对抗人脸，在保持视觉质量的同时实现高成功率身份冒充攻击，平均ASR达85.90%。

详情

AI中文摘要

人脸识别（FR）技术的广泛普及引发了严重的隐私担忧，因为面部数据可能在未经同意的情况下被利用。为了解决这一挑战，我们提出了Adv-TGD，一个生成式对抗攻击框架，能够合成逼真的人脸，冒充目标身份并欺骗人脸识别系统。基于Stable Diffusion，Adv-TGD对每个样本进行LoRA微调，以简洁的文本提示为条件，生成自然但具有对抗性操控的身份。与传统的身份攻击方法不同，我们的方法在单步去噪过程中为每个源-目标对优化轻量级交叉注意力适配器。潜在混合受到面部局部热图掩码的约束，以确保空间精确的身份操控，同时保留非敏感区域。我们引入了一个复合目标，结合了掩码epsilon-MSE重建、FR嵌入空间中的阈值化身份差异、方向特征对齐和源相似性抑制，以平衡对抗攻击和视觉真实性。可选地，LLaVA生成的属性提示增强了细粒度语义细节，而不会重新引入身份线索。在黑盒评估协议下，Adv-TGD在IR152、IRSE50、MobileFace和FaceNet上平均攻击成功率（ASR）达到85.90%，超过语义SOTA基线Adv-CPG +6.25个百分点、基于扩散的化妆方法DiffAIM +3个百分点以及基于噪声的P3-Mask +16个百分点。尽管攻击效果强劲，Adv-TGD仍保持了高视觉保真度（PSNR = 27.15 dB，SSIM = 0.981）。此外，我们通过成功将其扩展到野外数据集（LADN）、通用对象分类（ImageNet）和基于Transformer的扩散模型（FLUX.1），展示了我们框架的灵活性。

英文摘要

The widespread adoption of face recognition (FR) technologies raises serious privacy concerns, as facial data can be exploited without consent. To address this challenge, we propose Adv-TGD, a generative adversarial attack framework that synthesizes photorealistic faces capable of impersonating target identities and deceiving face recognition systems. Built upon Stable Diffusion v2.1, Adv-TGD performs per-sample LoRA fine-tuning conditioned on concise textual prompts to generate natural yet adversarially manipulated identities. Unlike conventional identity attack approaches, our method optimizes lightweight cross-attention adapters for each source-target pair within a fixed-timestep denoising process. Latent blending is constrained by a face-local heatmap mask to ensure spatially precise identity manipulation while preserving non-sensitive regions. We introduce a composite objective that integrates masked epsilon-MSE reconstruction, thresholded identity divergence in FR embedding space, directional feature alignment, and source-similarity suppression to balance adversarial attack and visual realism. Optionally, LLaVA-generated attribute prompts enhance fine-grained semantic details without reintroducing identity cues. Under the black-box evaluation protocol, Adv-TGD attains an average attack success rate (ASR) of 85.90% across IR152, IRSE50, MobileFace, and FaceNet, surpassing the semantic SOTA baseline Adv-CPG by 6.25 points, the diffusion-based makeup method DiffAIM by 3 points, and the noise-based P3-Mask by 16 points. Despite its strong attack efficacy, Adv-TGD preserves high visual fidelity (PSNR = 28.18 dB, SSIM = 0.981). Furthermore, we demonstrate the flexibility of our framework by successfully extending it to in-the-wild datasets (LADN), general object classification (ImageNet), and transformer-based diffusion models (FLUX.1).

URL PDF HTML ☆

赞 0 踩 0

2606.12816 2026-06-18 quant-ph cs.ET cs.LG 版本更新

Graph Reinforcement Learning for Calibration-Aware Quantum Circuit Routing

图强化学习用于校准感知的量子电路路由

Yash Vardhan Tomar, Dheeraj Peddireddy

发表机构 * University of California, Berkeley（加州大学伯克利分校）； National Institute of Standards and Technology（国家标准与技术研究院）

AI总结提出一种利用图强化学习进行校准感知的量子电路路由方法，通过IBM Heron r2校准数据选择SWAP操作，在MQT Bench电路上平均保真度达0.727，优于SABRE-best20的0.440。

详情

AI中文摘要

量子电路路由是在为噪声中等规模量子处理器编译程序时的关键步骤。通过标准开销指标看似高效的路由，在通过校准不良的耦合器时仍可能损失保真度。我们研究了一种校准感知的图强化学习路由器，该路由器使用当天的IBM Heron r2校准数据来选择硬件边缘SWAP。我们使用近端策略优化训练策略，并通过九个慕尼黑量子工具包（MQT）基准电路和三个校准快照的精确模拟保真度进行评估。在这些评估中，合并的平均精确保真度为$0.727$，而SABRE-best20为$0.440$，目标感知SABRE为$0.481$。保真度增益伴随着更高的路由双量子比特计数，并集中在5q和8q电路系列中；在固定树动作图下，所有10q系列都倾向于SABRE-best20。总体而言，我们的结果表明，校准感知的学习路由可以超越基于门计数的编译，提高保真度。

英文摘要

Quantum circuit routing is a key step in compiling programs for noisy intermediate-scale quantum processors. Routes that appear efficient by standard overhead metrics can still lose fidelity when they pass through poorly calibrated couplers. We study a calibration-aware graph reinforcement-learning router that uses same-day IBM Heron r2 calibration data to choose hardware-edge SWAPs. We train the policy with proximal policy optimization and evaluate it with exact simulated fidelity across nine Munich Quantum Toolkit (MQT) Bench circuits and three calibration snapshots. Across these evaluations, pooled mean exact fidelity is $0.727$, compared with $0.440$ for SABRE-best20 and $0.481$ for target-aware SABRE. We observed that fidelity gains came with higher routed two-qubit counts and were concentrated in 5 qubit and 8 qubit circuit families; under the fixed tree action graph, all 10 qubit families favored SABRE-best20. Overall, our results show that calibration-aware learned routing can improve fidelity beyond gate-count-driven compilation.

URL PDF HTML ☆

赞 0 踩 0

2606.17276 2026-06-18 cs.IR cs.LG 版本更新

从机制到组合可解释性

Ward Gauderis, Thomas Dooms, Steven T. Homer, Kola Ayonrinde, Geraint A. Wiggins

发表机构 * UK AI Security Institute（英国人工智能安全研究所）

AI总结本文提出组合可解释性框架，通过范畴论原理解决机制可解释性无法客观验证的问题，将解释质量分解为忠实度和复杂度，引入压缩细化方法实现模型简化，理论证明简洁性准则保障人类对齐的解释。

详情

AI中文摘要

机制可解释性旨在通过逆向工程神经模型的行为来解释其计算结构，但缺乏正式框架导致无法客观验证。本文引入组合可解释性，基于组合性和最小描述长度原则的范畴论框架。组合解释是语法和语义映射的对，必须满足一致性。将解释质量分解为忠实度和复杂度，将其视为约束优化问题，并引入压缩细化方法系统地重构模型为更简单的部分。最后证明了在简洁性准则下，语法压缩理论上能保证更简洁的人类对齐解释。该框架将 prominent 机制方法作为细化子类，澄清了为何其压缩性启发式方法与人类可解释性一致。本文为自动化发现和评估机制解释提供了可测量、可优化的基础。

英文摘要

Mechanistic interpretability aims to explain neural model behaviour by reverse-engineering learned computational structure into human-understandable components. Without a formal framework, however, mechanistic explanations cannot be objectively verified, compared, or composed. We introduce compositional interpretability, a category-theoretic framework grounded in the principles of compositionality and minimum description length. Compositional interpretations are pairs of syntactic and semantic mappings that must commute to enforce consistency between a model's decomposition and its observed behaviour. We deconstruct explanation quality into measures of faithfulness and complexity to cast interpretability as a constrained optimisation problem, and introduce compressive refinement to systematically restructure models into simpler parts without altering their function. Finally, we derive a parsimony criterion under which syntactic compression theoretically guarantees more concise, human-aligned explanations. Our framework situates prominent mechanistic methods as subclasses of refinement, and clarifies why their compressibility heuristics tend to align with human interpretability. Our work provides a measurable, optimisable blueprint for automating the discovery and evaluation of mechanistic explanations.

URL PDF HTML ☆

赞 0 踩 0

2410.21258 2026-06-18 quant-ph cs.CC cs.LG 版本更新

Provable quantum speedups for computing persistence in topological data analysis

可证明的量子加速用于拓扑数据分析中的持久性计算

Casper Gyurik, Alexander Schmidhuber, Robbie King, Vedran Dunjko, Ryu Hayakawa

发表机构 * applied Quantum algorithms (aQa), Leiden University, 2300 RA Leiden, The Netherlands ； Center for Theoretical Physics, Massachusetts Institute of Technology, Cambridge, USA ； Department of Computing ； Yukawa Institute for Theoretical Physics \& The Hakubi Center, Kyoto University, Japan

AI总结提出一种高效量子算法，用于判断拓扑数据分析中洞的持久性，并证明该问题为BQP_1-hard，暗示在标准复杂性假设下存在指数级量子加速。

Comments 17 pages

详情

DOI: 10.1103/gvys-hl8h
Journal ref: PRX Quantum 7, 020361 (2026)

AI中文摘要

拓扑数据分析（TDA）旨在通过检查数据拓扑中空洞的数量和持久性，从数据集中提取对噪声鲁棒的特征。我们为与TDA核心任务密切相关的一个计算问题提供了高效的量子算法——判断给定空洞是否在不同长度尺度上持续存在。此外，我们证明该问题本身是$\mathsf{BQP}_1$-hard的，意味着经典解决方案极不可能；这与所有先前的TDA量子方法形成对比，在这些方法中，问题对于量子计算机也是难解的，或者严格的经典困难性证明仍然悬而未决。这一结果表明，在标准复杂性理论假设下，该问题存在指数级的量子加速。我们的方法依赖于将空洞的持久性编码到引导稀疏哈密顿量问题的一个变体中，其中引导态由空洞的调和代表元构造而成。

英文摘要

Topological data analysis (TDA) aims to extract noise-robust features from a data set by examining the number and persistence of holes in its topology. We provide an efficient quantum algorithm for a computational problem closely related to a core task in TDA -- determining whether a given hole persists across different length scales. Further, we prove the problem itself is $\mathsf{BQP}_1$-hard, implying that a classical solution is extremely unlikely; this stands in contrast to all previous quantum approaches to TDA, where the problems were also intractable for quantum computers, or where a rigorous proof of classical hardness still remains open. This result implies an {exponential} quantum speedup for this problem under standard complexity-theoretic assumptions. Our approach relies on encoding the persistence of a hole in a variant of the guided sparse Hamiltonian problem, where the guiding state is constructed from a harmonic representative of the hole.

URL PDF HTML ☆

赞 0 踩 0

2604.23716 2026-06-18 cs.AI cs.IT cs.LG cs.MA math.IT 版本更新

Information-Theoretic Measures in AI: A Practical Decision Guide

人工智能中的信息论度量：实用决策指南

Nikolaos Al. Papadopoulos, Konstantinos E. Psannis

发表机构 * Department of Applied Informatics, University of Macedonia（马其顿大学应用信息系）

AI总结本文为七种信息论度量提供实用决策框架，围绕每个度量的三个关键问题：回答的问题与AI场景、适合的估计器、最危险的误用，并附有流程图和决策表。

Comments 25 pages, 2 tables, 1 figure. Submitted to Entropy (MDPI)

详情

AI中文摘要

信息论（IT）度量在人工智能中无处不在：熵驱动决策树分裂和不确定性量化，交叉熵是默认的分类损失，互信息支撑表示学习和特征选择，转移熵揭示动态系统中的有向影响。第二类较不成熟的度量——整合信息（Phi）、有效信息（EI）和自主性——已出现用于表征智能体复杂性。尽管被广泛采用，度量选择常常与估计器假设、失败模式和安全的推断主张脱节。本文为所有七种度量提供了一个实用决策框架，围绕每个度量的三个指导性问题组织：（i）该度量回答什么问题，在何种AI背景下；（ii）哪种估计器适合数据类型和维度；（iii）最危险的误用是什么。该框架通过两个互补的人工制品实现：度量选择流程图和主决策表。我们涵盖每个度量的AI/ML和决策智能体应用领域，并使用标准化桥接框将IT量与认知构造联系起来。三个工作示例展示了该框架在具体从业者场景中的应用，涵盖表示学习、时间影响分析和进化智能体复杂性。

英文摘要

Information-theoretic (IT) measures are ubiquitous in artificial intelligence: entropy drives decision-tree splits and uncertainty quantification, cross-entropy is the default classification loss, mutual information underpins representation learning and feature selection, and transfer entropy reveals directed influence in dynamical systems. A second, less consolidated family of measures, integrated information (Phi), effective information (EI), and autonomy, has emerged for characterizing agent complexity. Despite wide adoption, measure selection is often decoupled from estimator assumptions, failure modes, and safe inferential claims. This paper provides a practical decision framework for all seven measures, organized around three prescriptive questions for each: (i) what question does the measure answer and in which AI context; (ii) which estimator is appropriate for the data type and dimensionality; and (iii) what is the most dangerous misuse. The framework is operationalized in two complementary artifacts: a measure-selection flowchart and a master decision table. We cover both AI/ML and decision-making agent application domains per measure, with standardized Bridge Boxes linking IT quantities to cognitive constructs. Three worked examples illustrate the framework on concrete practitioner scenarios spanning representation learning, temporal influence analysis, and evolved agent complexity.

URL PDF HTML ☆

赞 0 踩 0

2605.17131 2026-06-18 cs.CV cs.AI cs.LG 版本更新

A Survey on Deep Learning Architectures for Point Cloud Classification and Segmentation

针对点云分类和分割的深度学习架构系统性调研

Minhas Kamal, Hiranya Garbha Kumar, Balakrishnan Prabhakaran

发表机构 * State University of New York at Albany（纽约州立大学阿尔巴尼分校）

AI总结本文系统性地探讨了点云分类和分割中的深度学习架构，分析了点云数据的结构特性，分类了不同架构的工作，并评估了其在主流基准上的性能，同时指出了开放挑战和未来方向。

Comments We reviewed a decade of advancements in point cloud processing: trace the evolution of the field from its foundational roots to the modern SOTA, analyze how diverse architectures overcome the inherent geometric challenges of 3D data, and map out critical research gaps alongside promising future directions. GitHub: https://github.com/MinhasKamal/DeepLearningForPointCloud

详情

DOI: 10.1145/3815180
Journal ref: ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 2026

AI中文摘要

点云因其简洁性和几何保真度而成为表示3D形状和场景最广泛采用的格式。然而，其固有的无序和不规则性质，加剧了传感器噪声和遮挡的影响，给基于机器学习的方法带来了独特的挑战。为应对这些问题，已开发出多种策略，包括转换为有序格式、提取局部几何特征以及基于排列不变或自注意力的处理方法。在本文中，我们的重点是深度学习模型在3D视觉三个基本任务中的应用：点云分类、部分分割和语义分割。我们首先正式定义点云数据，然后深入讨论其结构特性。接着，我们根据其骨干结构对重要工作进行分类，并评估其在流行基准上的性能。除了经验比较外，我们还提供了架构创新和局限性的见解。我们还概述了3D点云理解中的开放挑战和有前途的未来方向。

英文摘要

Point cloud stands as the most widely adopted format for representing 3D shapes and scenes due to its simplicity and geometric fidelity. However, its inherent unordered and irregular nature, exacerbated by sensor noise and occlusions, introduces unique challenges for machine learning based methodologies. To combat these issues, diverse strategies have been developed, including converting to a format that has orderliness, extracting local geometry, and permutation-invariant or self-attention-based processing. In this paper, our focus is directed towards deep learning models for three fundamental tasks in 3D vision: point cloud classification, part segmentation, and semantic segmentation. We begin by formally defining point cloud data, followed by an in-depth discussion on its structural characteristics. Then, we categorize notable works based on their backbone structure and evaluate their performance on popular benchmarks. Beyond empirical comparison, we offer insights into architectural innovations and limitations. We also outline open challenges and promising future directions for 3D point cloud understanding.

URL PDF HTML ☆

赞 0 踩 0

2605.25929 2026-06-18 cs.MA cs.LG 版本更新

Multi-Agent Systems are Mixtures of Experts: Who Becomes an Influencer?

多智能体系统是专家混合：谁成为影响者？

Franka Bause, Jonas Niederle, Martin Pawelczyk, Rebekka Burkholz

发表机构 * CISPA Helmholtz Center for Information Security（CISPA海德堡信息安全中心）； Faculty of Computer Science, University of Vienna（维也纳大学计算机科学系）

AI总结本文通过Friedkin-Johnsen意见动力学模型分析多智能体LLM协商机制，揭示输入依赖的FJ参数使系统成为专家混合，并探讨基于自信度、感知自信度和初始观点对齐的影响者形成机制。

Comments Accepted at the 2nd Workshop on Compositional Learning at ICML 2026

2606.17454 2026-06-18 cs.AI cs.LG 版本更新

Dissecting model behavior through agent trajectories

通过智能体轨迹剖析模型行为

Gaurav Gupta, Vatshank Chaturvedi, Jun Huan, Anoop Deoras

发表机构 * AWS AI Labs（AWS人工智能实验室）

AI总结本文提出“意图-执行差距”概念，并设计Simple Strands Agent（SSA）框架，通过分析138k条轨迹揭示模型在自主问题解决中的行为差异。

Comments 106 pages, 50 Figures, 16 Tables

详情

AI中文摘要

AI智能体性能不仅仅是一个建模问题，它本质上是一个系统问题。模型的高级能力通过智能体框架（harness）实现。因此，模型假设与框架行为之间的差距很容易阻止模型的全部能力转化为智能体性能。我们将此形式化为“意图-执行差距”：模型意图与框架执行之间的不匹配，反之亦然。我们认为，最小化这种意图-执行差距与框架设计的其他方面（如工具和执行循环）同样重要。为了说明这种框架-模型对齐的影响，我们开发了一个简单且可定制的框架，称为“Simple Strands Agent”（SSA）。SSA旨在找到跨不同模型家族（如Claude、Gemini、GPT、Grok、Qwen）通用的常见模式，以及少量模型特定的偏好。我们做出两个贡献：（i）我们在流行的智能体基准测试（SWE-Pro、SWE-Verified和Terminal-Bench-2）上**复现或改进了**不同模型提供商家族报告的pass@1性能；（ii）基于对**SSA生成的138k条轨迹的分析**，我们超越了前沿模型之间通常相对均匀的pass@1数字。通过在代码状态空间中表示智能体轨迹，我们观察到问题解决行为中的模型级差异。更细粒度的指标，如编辑频率、测试活动和阶段转换，揭示了单个模型如何在自主问题解决的不同阶段分配努力。

英文摘要

AI agent performance is not just a modeling problem, it is fundamentally a systems problem. The advanced capabilities of models are realized through agent harnesses. Therefore, a gap between model assumptions and harness behavior can easily prevent the model's full capabilities from translating into agent performance. We formalize this as the `intent-execution' gap: the mismatch between what the model intends and what the harness executes, and vice versa. We argue that minimizing this intent-execution gap is as important as other aspects of harness design such as tools and execution loops. To illustrate the impact of this harness-model alignment, we develop a simple and customizable harness called `Simple Strands Agent' (SSA). SSA aims to find the bulk of common patterns which generalize across different model families (such as Claude, Gemini, GPT, Grok, Qwen), as well as a small number of model-specific preferences. We make two contributions: (i) we reproduce or improve on the pass@1 performance reported by diverse model-provider families on popular agentic benchmarks (SWE-Pro, SWE-Verified and Terminal-Bench-2), and (ii) building on an analysis of 138k trajectories generated by SSA, we look beyond the pass@1 numbers which tend to be relatively even across frontier models. By representing agent trajectories in code state-spaces, we observe model-level differences in problem-solving behavior. Finer-grained metrics such as edit frequency, testing activity, and phase-transitions reveal how individual models allocate effort across different stages of autonomous problem solving.

URL PDF HTML ☆

赞 0 踩 0

2510.15300 2026-06-18 cs.LG 版本更新

DFCA: Decentralized Federated Clustering Algorithm

Jonas Kirch, Sebastian Becker, Tiago Koketsu Rodrigues, Stefan Harmeling

发表机构 * Fraunhofer Institute for Software and Systems Engineering（弗劳恩霍夫软件与系统工程研究所）； Lamarr Institute for Machine Learning and AI（拉马尔人工智能与机器学习研究所）

2601.18637 2026-06-18 quant-ph cs.LG stat.ML 版本更新

Universality of Many-body Projected Ensemble for Learning Quantum Data Distribution

Quoc Hoan Tran, Koki Chinzei, Yasuhiro Endo, Hirotaka Oshima

发表机构 * Quantum Laboratory, Fujitsu Research, Fujitsu Limited, Kawasaki, Kanagawa 211-8588, Japan（富士通量子实验室，富士通研究，富士通株式会社，神户，神奈川县211-8588，日本）

Comments 21 pages, 6 figures (added Github repository)

2405.14273 2026-06-18 cs.LG cs.AI math.OC 版本更新

Exact Solution to Data-Driven Inverse Optimization of MILPs in Finite Time via Gradient-Based Methods

通过基于梯度的方法在有限时间内精确求解混合整数线性规划的驱动数据反优化问题

Akira Kitaoka

发表机构 * NEC Corporation（日本电气株式会社）

AI总结本文研究了混合整数线性规划中驱动数据反优化问题，揭示了子最优损失的几何结构，并证明了基于梯度的优化方法可以在有限次迭代内达到观测数据的一致性，同时给出了投影子梯度下降法的迭代次数上界。

Comments 66 pages; comments are welcome

详情

AI中文摘要

驱动数据反优化问题（DDIOP）是估计能够解释观测最优解数据的目标函数参数（权重）的问题，广泛应用于混合整数线性规划（MILP）中。在MILP的反优化中，特征的预测误差对权重的不连续性使得直接应用基于梯度的优化方法具有挑战性。本文聚焦于子最优损失，该损失在权重与观测数据完全一致时达到最小值零。我们揭示了该损失的几何结构——它具有凸性和分段线性特性，并且与观测数据完全一致的权重集合具有正的“厚度”而非单一点或薄边界。利用这一结构，我们证明了：首先，一类广泛的基于梯度的优化方法，包括投影子梯度下降法，在有限次迭代中可以达到观测数据的一致性（在有限时间内获得精确解）。其次，对于投影子梯度下降法，我们给出了达到精确一致性的迭代次数的显式上界。第三，当正向问题是一个整数线性规划（ILP）时，我们将其上界表示为仅由样本数、特征维度和约束系数矩阵结构（例如，若系数矩阵是总模矩阵，则迭代次数被显式地限制为样本数平方和维度的多项式）决定的完全显式迭代次数。通过数值实验，我们验证了这种有限步数达到行为。

英文摘要

A data-driven inverse optimization problem (DDIOP) is the problem of estimating the objective-function parameters (weights) that explain observed optimal-solution data, and it arises in many applications, including mixed integer linear programming (MILP). In inverse optimization for MILPs, the prediction error of the features is discontinuous with respect to the weights, so applying gradient-based optimization directly is difficult. In this paper we focus on the suboptimality loss. This loss attains its minimum value, zero, if and only if the weights are exactly consistent with the observed data. We reveal a geometric structure of this loss -- it is convex and piecewise linear, and moreover the set of weights that are exactly consistent with the observed data has a positive ``thickness'' rather than being a single point or a thin boundary -- and use it to show the following. First, a broad class of gradient-based optimization methods, including projected subgradient descent, reaches exact consistency with the observed data in finitely many iterations (an exact solution is obtained in finite time). Second, for projected subgradient descent we give an explicit upper bound on the number of iterations needed to reach exact consistency. Third, when the forward problem is an integer linear program (ILP), we give this upper bound as a fully explicit iteration count determined solely by the number of samples, the dimension of the features, and the structure of the constraint coefficient matrix. Through numerical experiments, we confirm this finite-step attainment behavior.

URL PDF HTML ☆

赞 0 踩 0

2407.00449 2026-06-18 cs.LG cs.AI cs.NE 版本更新

Fully tensorial approach to hypercomplex-valued neural networks

Agnieszka Niemczynowicz, Radosław Antoni Kycia

发表机构 * Faculty of Computer Science and Mathematics, Cracow University of Technology（克拉科夫技术大学计算机科学与数学系）

Comments 23 pages, 3 figures

2512.17696 2026-06-18 cs.LG stat.ME stat.ML 版本更新

Spatially-informed transformers: Injecting geostatistical covariance biases into self-attention for spatio-temporal forecasting

Yuri Calleo

发表机构 * Unimercatorum（乌尼默卡图姆大学）

详情

DOI: 10.1007/s11135-026-02743-9

英文摘要

The modeling of high-dimensional spatio-temporal processes presents a fundamental dichotomy between the probabilistic rigor of classical geostatistics and the flexible, high-capacity representations of deep learning. While Gaussian processes offer theoretical consistency and exact uncertainty quantification, their prohibitive computational scaling renders them impractical for massive sensor networks. Conversely, modern transformer architectures excel at sequence modeling but inherently lack a geometric inductive bias, treating spatial sensors as permutation-invariant tokens without a native understanding of distance. In this work, we propose a spatially-informed transformer, a hybrid architecture that injects a geostatistical inductive bias directly into the self-attention mechanism via a learnable covariance kernel. By formally decomposing the attention structure into a stationary physical prior and a non-stationary data-driven residual, we impose a soft topological constraint that favors spatially proximal interactions while retaining the capacity to model complex dynamics. We demonstrate the phenomenon of ``Deep Variography'', where the network successfully recovers the true spatial decay parameters of the underlying process end-to-end via backpropagation. Extensive experiments on synthetic Gaussian random fields and real-world traffic benchmarks confirm that our method outperforms state-of-the-art graph neural networks. Furthermore, rigorous statistical validation confirms that the proposed method delivers not only superior predictive accuracy but also well-calibrated probabilistic forecasts, effectively bridging the gap between physics-aware modeling and data-driven learning.

URL PDF HTML ☆

赞 0 踩 0

2508.06406 2026-06-18 cs.DC cs.LG 版本更新

Blockchain-Enabled Federated Learning

Murtaza Rangwala, KR Venugopal, Rajkumar Buyya

发表机构 * Quantum Cloud and Distributed Systems (qCLOUDS) Lab, School of Computing and Information Systems, The University of Melbourne, Australia（量子云与分布式系统实验室，计算机与信息系统学院，墨尔本大学，澳大利亚）； Department of Computer Science and Engineering, University of Visvesvaraya College of Engineering, Bangalore University, India（计算机科学与工程系，维萨瓦拉亚工程学院，班加罗尔大学，印度）

Comments 32 pages, 6 figures, chapter for edited book (Federated Learning: Foundations and Applications)

详情

DOI: 10.1016/B978-0-44-344433-3.00018-6

英文摘要

Blockchain-enabled federated learning (BCFL) addresses fundamental challenges of trust, privacy, and coordination in collaborative AI systems. This chapter provides comprehensive architectural analysis of BCFL systems through a systematic four-dimensional taxonomy examining coordination structures, consensus mechanisms, storage architectures, and trust models. We analyze design patterns from blockchain-verified centralized coordination to fully decentralized peer-to-peer networks, evaluating trade-offs in scalability, security, and performance. Through detailed examination of consensus mechanisms designed for federated learning contexts, including Proof of Quality and Proof of Federated Learning, we demonstrate how computational work can be repurposed from arbitrary cryptographic puzzles to productive machine learning tasks. The chapter addresses critical storage challenges by examining multi-tier architectures that balance blockchain's transaction constraints with neural networks' large parameter requirements while maintaining cryptographic integrity. A technical case study of the TrustMesh framework illustrates practical implementation considerations in BCFL systems through distributed image classification training, demonstrating effective collaborative learning across IoT devices with highly non-IID data distributions while maintaining complete transparency and fault tolerance. Analysis of real-world deployments across healthcare consortiums, financial services, and IoT security applications validates the practical viability of BCFL systems, achieving performance comparable to centralized approaches while providing enhanced security guarantees and enabling new models of trustless collaborative intelligence.

URL PDF HTML ☆

赞 0 踩 0

2508.20275 2026-06-18 cs.LG cs.CL q-bio.QM 版本更新

A Systematic Review on the Generative AI Applications in Human Medical Genomics

Anton Changalidis, Yury Barbitoff, Yulia Nasykhova, Andrey Glotov

发表机构 * Dpt. of Genomic Medicine（基因组医学系）； D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology（D.O. Ott妇产科与生殖医学研究所）

Comments 31 pages, 5 figures

详情

DOI: 10.3389/fgene.2025.1694070
Journal ref: Frontiers in Genetics 16 (2026) 1694070

英文摘要

Although traditional statistical techniques and machine learning methods have contributed significantly to genetics and, in particular, inherited disease diagnosis, they often struggle with complex, high-dimensional data, a challenge now addressed by state-of-the-art deep learning models. Large language models (LLMs), based on transformer architectures, have excelled in tasks requiring contextual comprehension of unstructured medical data. This systematic review examines the role of LLMs in the genetic research and diagnostics of both rare and common diseases. Automated keyword-based search in PubMed, bioRxiv, medRxiv, and arXiv was conducted, targeting studies on LLM applications in diagnostics and education within genetics and removing irrelevant or outdated models. A total of 172 studies were analyzed, highlighting applications in genomic variant identification, annotation, and interpretation, as well as medical imaging advancements through vision transformers. Key findings indicate that while transformer-based models significantly advance disease and risk stratification, variant interpretation, medical imaging analysis, and report generation, major challenges persist in integrating multimodal data (genomic sequences, imaging, and clinical records) into unified and clinically robust pipelines, facing limitations in generalizability and practical implementation in clinical settings. This review provides a comprehensive classification and assessment of the current capabilities and limitations of LLMs in transforming hereditary disease diagnostics and supporting genetic education, serving as a guide to navigate this rapidly evolving field.

URL PDF HTML ☆

赞 0 踩 0

2503.01163 2026-06-18 cs.AI cs.CL cs.HC cs.LG cs.NE 版本更新

Bandit-Based Prompt Design Strategy Selection Improves Prompt Optimizers

Rin Ashizawa, Yoichi Hirose, Nozomu Yoshinari, Kento Uchida, Shinichi Shirakawa

发表机构 * Yokohama National University（横滨国立大学）

Comments Accepted to ACL 2025 Findings

2502.15376 2026-06-18 cs.LG cond-mat.mes-hall 版本更新

Learning Chern Numbers of Topological Insulators with Gauge Equivariant Neural Networks

Longde Huang, Oleksandr Balabanov, Hampus Linander, Mats Granath, Daniel Persson, Jan E. Gerken

发表机构 * Department of Mathematical Sciences, Chalmers University of Technology and University of Gothenburg（数学科学系，查尔姆斯理工大学和哥德堡大学）； Department of Physics, Stockholm University, AlbaNova University Center（物理系，斯德哥尔摩大学，阿尔巴诺瓦大学中心）； VERSES AI Research Lab, Los Angeles, USA（VERSES AI研究实验室，美国洛杉矶）； Department of Physics, University of Gothenburg（物理系，哥德堡大学）

2410.23503 2026-06-18 cs.LG 版本更新

Development and Comparative Analysis of Machine Learning Models for Hypoxemia Severity Triage in CBRNE Emergency Scenarios Using Physiological and Demographic Data from Medical-Grade Devices

Santino Nanini, Mariem Abid, Yassir Mamouni, Arnaud Wiedemann, Philippe Jouvet, Stephane Bourassa

发表机构 * SADC-CDSS IA PEDIATRICS, CHU Sainte-Justine, Montreal, Canada（SADC-CDSS IA儿科，圣-朱斯特医院，蒙特利尔，加拿大）； Solutions Applicare AI Inc., Montreal, Canada（应用爱智AI公司，蒙特利尔，加拿大）； Université de Montréal, Canada（蒙特利尔大学，加拿大）； MEDINT CBRNE Group, Montreal, Canada（MEDINT CBRNE组，蒙特利尔，加拿大）

Comments 12 figures, 12 tables and 39 pages

详情

DOI: 10.3390/diagnostics14232763
Journal ref: Diagnostics 14 (2024) 2763

英文摘要

This paper presents the development of machine learning (ML) models to predict hypoxemia severity during emergency triage, especially in Chemical, Biological, Radiological, Nuclear, and Explosive (CBRNE) events, using physiological data from medical-grade sensors. Gradient Boosting Models (XGBoost, LightGBM, CatBoost) and sequential models (LSTM, GRU) were trained on physiological and demographic data from the MIMIC-III and IV datasets. A robust preprocessing pipeline addressed missing data, class imbalances, and incorporated synthetic data flagged with masks. Gradient Boosting Models (GBMs) outperformed sequential models in terms of training speed, interpretability, and reliability, making them well-suited for real-time decision-making. While their performance was comparable to that of sequential models, the GBMs used score features from six physiological variables derived from the enhanced National Early Warning Score (NEWS) 2, which we termed NEWS2+. This approach significantly improved prediction accuracy. While sequential models handled temporal data well, their performance gains did not justify the higher computational cost. A 5-minute prediction window was chosen for timely intervention, with minute-level interpolations standardizing the data. Feature importance analysis highlighted the significant role of mask and score features in enhancing both transparency and performance. Temporal dependencies proved to be less critical, as Gradient Boosting Models were able to capture key patterns effectively without relying on them. This study highlights ML's potential to improve triage and reduce alarm fatigue. Future work will integrate data from multiple hospitals to enhance model generalizability across clinical settings.

URL PDF HTML ☆

赞 0 踩 0

2211.01960 2026-06-18 q-bio.NC cs.HC cs.LG 版本更新

FingerFlex: Inferring Finger Trajectories from ECoG signals

Vladislav Lomtev, Alexander Kovalev, Alexey Timchenko

发表机构 * Bauman Moscow State Technical University（巴乌曼莫斯科国立技术大学）； ALVI Labs（ALVI实验室）； Brain Dynamics Group, Higher School of Economics（高等经济学院脑动力组）； University of Tuebingen（图宾根大学）

Comments 6 pages, 3 figures, 4 tables. Preprint. Under review

1909.13203 2026-06-18 cs.LG stat.ML 版本更新

Learning transport cost from subset correspondence

Ruishan Liu, Akshay Balsubramani, James Zou

发表机构 * Department of Electrical Engineering（电气工程系）； Department of Genetics（遗传学系）； Stanford University（斯坦福大学）； Department of Biomedical Data Science（生物医学数据科学系）

1. 深度学习架构与训练方法 12 篇

RNN(p) for Power Consumption Forecasting

Depth-Width tradeoffs in Algorithmic Reasoning of Graph Tasks with Transformers

Generalized Kullback-Leibler Divergence Loss

Self-Evolving Multi-Agent Systems via Textual Backpropagation

Decomposing Prediction Mechanisms for In-Context Recall

InstructTime++: Time Series Classification with Multimodal Language Modeling via Implicit Feature Enhancement

TINNs: Time-Induced Neural Networks for Solving Time-Dependent PDEs

The Long Delay to Arithmetic Generalization: When Learned Representations Outrun Behavior

Beyond Similarity: Temporal Operator Attention for Time Series Analysis

Trust Region On-Policy Distillation

HAARES Half-Split Residual Basis Routing for Deep Transformers

Cosmos 3: Omnimodal World Models for Physical AI

2. 表示学习、自监督与对比学习 4 篇

Self-attention-based non-linear basis transformations for compact latent space modelling of dynamic optical fibre transmission matrices

Clin-JEPA: A Multi-Phase Co-Training Framework for Joint-Embedding Predictive Pretraining on EHR Patient Trajectories

Bag of Dims: Training-Free Mechanistic Interpretability via Dimension-Level Sign Patterns

Zero-Shot Cross-City Generalization in End-to-End Autonomous Driving: Self-Supervised versus Supervised Representations

3. 强化学习与序列决策 5 篇

Reinforcement Learning for Accelerated Aerodynamic Shape Optimisation

Hierarchical Planning with Latent World Models

Short-Term-to-Long-Term Memory Transfer for Knowledge Graphs under Partial Observability

SymQNet: Amortized Acquisition for Low-Latency Adaptive Hamiltonian Learning

GrowthHacker: Automated Off-Policy Evaluation Optimization Using Code-Modifying LLM Agents

4. 生成模型与概率建模 12 篇

PRISM: A 3D Probabilistic Neural Representation for Interpretable Shape Modeling

Riemannian MeanFlow for One-Step Generation on Manifolds

Generative models for decision-making under distributional shift

Dimension-Free Convergence of Discrete Diffusion Models: Adjoint Equations Induce the Right Space

Unsupervised Diffusion Solver for Combinatorial Optimization via Combinatorial Adjoint Matching

UPLOTS: A Unified Pretrained Language Model for Constrained Time-series Generation

DiPOD: Diffusion Policy Optimization without Drifting Apart

VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation

Regular Fourier Features for Nonstationary Gaussian Processes

Triangular-Reference Schrödinger Bridges for Time Series Generation

Latent-Conditioned Parameterized Quantum Circuits as Universal Approximators for Distributions over Quantum States

A Bayesian Boolean Matrix Factorization with Application to Copy Number Analysis in Cancer

5. 优化、泛化与理论分析 10 篇

The Implicit Bias of Steepest Descent with Mini-batch Stochastic Gradient

Scalable Batch Bayesian Optimization Via Subspace Acquisition Functions

On the Stability of the Jacobian Matrix in Deep Neural Networks

Stochastic Adaptive Gradient Descent Without Descent

On the Stability of Nonlinear Dynamics in GD and SGD: Beyond Quadratic Potentials

QUIVER: Cost-Aware Adaptive Preference Querying in Surrogate-Assisted Evolutionary Multi-Objective Optimization

Clustering and Pruning in Causal Data Fusion

How fast can you find a good hypothesis?

How Does the ReLU Activation Affect the Implicit Bias of Gradient Descent on High-dimensional Neural Network Regression?

Everywhere Valid Bounds on False Discovery Proportions in Conformal Inference

6. 高效学习、压缩与部署 6 篇

Task-Adaptive Parameter-Efficient Fine-Tuning for Weather Foundation Models

HeRo-Q: A General Framework for Stable Low Bit Quantization via Hessian Conditioning

LLM Compression by Block Removal with Constrained Binary Optimization

KANELÉ: Kolmogorov-Arnold Networks for Efficient LUT-based Evaluation

Ultrafast On-chip Online Learning via Spline Locality in Kolmogorov-Arnold Networks

Knockoffs-based False Discovery Rate Control and Simplification for Deep Neural Networks

7. 联邦学习、隐私与安全 4 篇

Efficient Zeroth-Order Federated Finetuning of Language Models on Resource-Constrained Devices

FinP: Fairness-in-Privacy in Federated Learning by Addressing Disparities in Privacy Risk

Model Collapse Is Not a Bug but a Feature in Machine Unlearning for LLMs

Automated Byzantine-Resilient Clustered Decentralized Federated Learning for Battery Intelligence in Connected EVs

8. 鲁棒性、不确定性与可信学习 5 篇

RUB: Evaluating Residual Knowledge in Unlearned Models

Revealing Hidden Vulnerabilities in Autoencoders through Gradient Signal Restoration

Calibrated Sampling-Free Uncertainty Estimation in Bayesian Deep Learning

Robust Detection of Planted Subgraphs in Semi-Random Models

Not Just How Much, But Where: Decomposing Epistemic Uncertainty into Per-Class Contributions

9. 图学习与结构化数据 3 篇

UST-GNN: A Unified Spatial--Topological Graph Neural Network Framework for Urban Analytics--Demonstrated through a Case Study on Urban Health Prediction

Formalizing and Mitigating Structural Distortion in LLM Attention for Graph Reasoning

Fully Geometric Multi-Hop Reasoning on Knowledge Graphs with Transitive Relations

10. 迁移、元学习与持续学习 5 篇

From Memorization to Parameter Interference: How Overtraining Experts Harms Model Merging

Do Neural Networks Lose Plasticity in a Gradually Changing World?

Simple Domain Generalization Methods are Strong Baselines for Open Domain Generalization

Rethinking Cross-lingual Gaps from a Statistical Viewpoint

Anti-causal domain generalization: Leveraging unlabeled data

11. 数据集、基准与评测 14 篇

Benchmarking Physics-Informed Time-Series Models for Operational Global Station Weather Forecasting

FORGE: Foundational Optimization Representations from Graph Embeddings

Surrogate Benchmarks for Model Merging Optimization