arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 3412
2605.24593 2026-05-26 cs.CV

Self-supervised Dynamic Heterogeneous Degradation Modeling for Unified Zero-Shot Image Restoration

自监督动态异质退化建模用于统一零样本图像恢复

XiaoWan Hu, Jing Yang, HeNan Liu, HuaQiu Li, Mai Xu

AI总结 提出统一物理零样本图像恢复框架,通过将异质退化重参数化为同质分布并引入动态质量细化策略,实现单/混合退化下的最优性能。

详情
AI中文摘要

零样本图像恢复提供了一种灵活的方式来处理各种退化,无需特定任务的训练。然而,现有方法通常依赖堆叠层或预训练特征来增强退化表达,同时忽略了物理一致的先验。不充分的退化提示在零样本扩散过程中带来了沉重的训练负担和高采样成本。此外,固定的推理轨迹在复杂损坏下往往收敛到次优解。我们观察到异质退化可以重参数化为一个最小物理一致参数集以实现紧凑表示。基于这一见解,我们首先提出一个统一的物理零样本图像恢复(UP-ZeroIR)框架,该框架将异质退化显式建模为同质全分布。该分布可以在潜在空间中直接优化,从而实现原则性的解探索和有效的提示适应。此外,我们引入了一种动态质量细化策略,自适应调整扩散轨迹以实现鲁棒的全局最优收敛。大量实验表明,我们的方法在单一和混合退化下均达到了最先进的性能。我们的代码可在 https://github.com/yangjinglyy/UP-ZeroIR 获取。

英文摘要

Zero-shot image restoration provides a flexible way to handle diverse degradations without task-specific training. However, existing methods typically rely on stacked layers or pre-trained features to enhance degradation expression, while overlooking physically consistent priors. The insufficient degradation prompts impose the heavy training burden and high sampling costs during zero-shot diffusion. Moreover, the fixed inference trajectory often collapses to suboptimal solutions under complex corruptions. We observe that heterogeneous degradations can be reparameterized into a minimal set of physically coherent parameters for compact representation. Based on this insight, we first propose a unified physical zero-shot image restoration (UP-ZeroIR) framework that explicitly models heterogeneous degradations into a homogeneous all-in-one distribution. The distribution can be optimized directly in the latent space, enabling principled solution exploration and effective prompt adaptation. Besides, we introduce a dynamic quality-refinement strategy that adaptively adjusts the diffusion trajectory for robust globally optimal convergence. Extensive experiments demonstrate that our method achieves state-of-the-art performance across both single and mixed degradations. Our code is available at https://github.com/yangjinglyy/UP-ZeroIR

2605.24592 2026-05-26 cs.RO

MuGen: Multi-Skill Generative Locomotion Controller for Humanoid Robots

MuGen: 人形机器人的多技能生成式运动控制器

Yusen Feng, Xiang Wang, Heyuan Yao, Zixi Kang, Xinyu Huo, Boyang Yu, Pengyun Qiu, Ruijie Zhao, Baoquan Chen, Libin Liu

AI总结 提出MuGen框架,利用VQ-VAE和教师-学生策略蒸馏,从异构人类运动数据中学习生成式运动表示,使人形机器人能够执行多技能运动并模仿未见过的动作。

详情
AI中文摘要

本文提出MuGen,一个数据驱动的框架,用于在人形机器人上学习和部署多技能运动。MuGen使机器人能够在示例运动序列的指导下,像人类一样执行富有表现力的动作。为此,我们采用基于模型强化学习训练的向量量化自编码器(VQ-VAEs),生成运动的表示,从数小时的异构人类运动数据中捕捉人类运动的关键模式。我们采用教师-学生学习框架,并开发了一种新的策略蒸馏策略,使可部署的学生策略能够学习这种高效的潜在表示。该策略允许机器人跟踪和模仿未见过的运动,并进一步使机器人能够将学到的潜在空间重用于其他任务。我们通过多样化的运动集和精确的执行来证明我们框架的有效性。

英文摘要

This paper presents MuGen, a data-driven framework for learning and deploying multi-skill locomotion on humanoid robots. MuGen enables a robot to perform expressive motions like humans under the guidance of example motion sequences. To achieve this, we employ vector-quantized autoencoders (VQ-VAEs) trained with model-based reinforcement learning, resulting in a generative representation of locomotion that captures key patterns of human motion from hours of heterogeneous human performance data. We employ a teacher-student learning framework and develop a new policy distillation strategy to enable a deployable student policy learning this efficient latent representation. This policy allows the robot to track and mimic unseen human motions and further enables the robot to reuse the learned latent space for other tasks. We demonstrate the effectiveness of our framework through a diverse set of motions and accurate execution.

2605.24590 2026-05-26 cs.CV cs.LG stat.ML

Physen-Noise2Noise: Physics-Guided Self-Supervised Defocus Deblurring with Bias Correction under Low-Light Conditions

Physen-Noise2Noise: 低光条件下带偏差校正的物理引导自监督散焦去模糊

Ziyan Huang, Lang Wu, Hongji Wang, Yifei Liu, Dongliang Tang, Hongqiao Wang

AI总结 提出一种基于物理模型的自监督散焦去模糊框架Physen-Noise2Noise,通过可学习噪声偏差参数和频域约束,在无干净参考图像的情况下联合校正偏差噪声并恢复高频细节。

Comments 14 pages

详情
AI中文摘要

低光、长曝光散焦去模糊由于同时存在严重模糊和复杂有偏噪声,仍然是一个具有挑战性的问题。现有方法通常依赖于简化的噪声假设,这限制了它们在真实成像条件下的有效性。在这项工作中,我们提出了Physen-Noise2Noise,一种由散焦成像物理模型引导的自监督去模糊框架,它利用有噪声的多帧观测,无需干净参考图像。与传统的基于Noise2Noise的方法假设零均值噪声不同,我们推导了散焦成像过程固有的频域约束,并通过可学习的噪声偏差参数将其纳入学习框架。此外,引入了一种多帧有噪初始化策略,在去模糊之前抑制复杂有偏噪声,为重建提供更稳定的起点。该公式显式建模有偏噪声,并在训练过程中实现联合偏差校正和高频细节恢复。此外,我们开发了一种预训练-微调变体,以增强在挑战性噪声条件下的鲁棒性和泛化能力。在模拟和真实数据集上的大量实验表明,所提出的方法在存在复杂有偏噪声的情况下,始终优于最先进的自监督散焦去模糊方法。

英文摘要

Low-light, long-exposure defocus deblurring remains a challenging problem due to the simultaneous presence of severe blur and complex biased noise. Existing methods typically rely on simplified noise assumptions, which limits their effectiveness under realistic imaging conditions. In this work, we propose Physen-Noise2Noise, a self-supervised deblurring framework guided by the physical model of defocus imaging, which leverages noisy multi-frame observations without requiring clean reference images. Unlike conventional Noise2Noise-based approaches that assume zero-mean noise, we derive a frequency-domain constraint inherent to the defocus imaging process and incorporate it into the learning framework via a learnable noise bias parameter. In addition, a multi-frame noisy initialization strategy is introduced to suppress complex biased noise prior to deblurring, providing a more stable starting point for reconstruction. This formulation explicitly models biased noise and enables joint bias correction and high-frequency detail recovery during training. Furthermore, we develop a pretrain-finetune variant to enhance robustness and generalization under challenging noise conditions. Extensive experiments on both simulation and real-world datasets demonstrate that the proposed method consistently outperforms state-of-the-art self-supervised approaches for defocus deblurring in the presence of complex biased noise.

2605.24588 2026-05-26 cs.AI cs.LG

HeartBeatAI: An Interpretable and Robust Deep Learning Framework for Multi-Label ECG Arrhythmia Detection

HeartBeatAI:用于多标签心电图心律失常的可解释且鲁棒的深度学习框架

Shubham Gupta, Nikhil Panwar, Partha Pratim Roy

AI总结 提出HeartBeatAI框架,结合域泛化、多尺度特征聚合和临床可解释性,通过Squeeze-and-Excitation ResNet和多层浓度管道实现鲁棒的12导联心电图分类,在跨数据集评估中达到98%宏F1分数,但跨机构部署时罕见异常检测仍存在挑战。

详情
AI中文摘要

虽然深度学习增强了自动化心电图分析,但临床部署受到类别不平衡和泛化差距的阻碍。本文提出了HeartBeatAI,一个结合域泛化、多尺度特征聚合和临床可解释性的深度学习框架,用于鲁棒的12导联心电图分类。超越基于图像的范式,HeartBeatAI集成了一个Squeeze-and-Excitation ResNet来隔离诊断导联,以及一个多层浓度管道来捕捉宏观节律和微观形态异常。为了缓解域偏移,该框架采用了MixStyle正则化和标签平滑。通过使用源内和留一域外协议在四个大规模数据集上进行严格的基准测试,在源内条件下实现了高性能(98%宏F1分数)。然而,留一域外评估揭示了检测罕见异常时的显著退化,突显了跨机构部署中持续存在的挑战。

英文摘要

While Deep Learning (DL) enhances automated electrocardiogram (ECG) analysis, clinical deployment is hindered by class imbalance and the generalization gap. This paper presents HeartBeatAI, a deep learning framework combining domain generalization, multi-scale feature aggregation, and clinical explainability for robust 12-lead ECG classification. Moving beyond image-based paradigms, HeartBeatAI integrates a Squeeze-and-Excitation (SE) ResNet to isolate diagnostic leads alongside a Multi-Layer Concentration Pipeline to capture macro-rhythm and micro-morphological anomalies. To mitigate domain shift, the framework employs MixStyle regularization and Label Smoothing. Rigorous benchmarking across four large-scale datasets using intra-source and Leave-One-Domain-Out (LODO) protocols demonstrates high performance (98% Macro F1-score) under intra-source conditions. However, LODO evaluations reveal significant degradation in detecting rare anomalies, highlighting a persistent challenge in cross-institutional deployment.

2605.24585 2026-05-26 cs.CL q-bio.NC

Word Class Representations Spontaneously Emerge from Successor Representations Trained on Natural Language

从自然语言训练的后继表示中自发涌现的词类表示

Mathis Immertreu, Achim Schilling, Thomas Kinfe, Patrick Krauss

AI总结 本研究将强化学习中的后继表示(SR)框架应用于自然语言,通过训练神经网络预测未来词分布,发现无监督下词类(如名词、动词、形容词)的几何结构自发涌现,且预测时域影响结构层次。

详情
AI中文摘要

语言模型通常被训练来预测序列中的下一个词。这里,我们探索来自强化学习的另一种预测原则:后继表示(SR),它建模未来状态的期望折扣分布,而不是直接的下一个状态。我们将这一框架迁移到自然语言,并训练神经网络在多个时间视界上预测未来词分布,从而学习长程转移结构的表示。我们在WikiText-103(1.03亿词;2万词词汇)上训练深度残差神经网络,并使用KL散度将后继表示优化为概率分布。在没有显式语言监督的情况下,结构化语言表示自发涌现。训练后,学习到的空间相对于词性(POS)类别发展出清晰的几何组织:名词、动词和形容词变得可分离,并通过无监督聚类恢复。这种组织系统地依赖于预测视界:短视界产生最强的句法结构,而长视界逐渐整合更广泛的上下文和语义信息。在更精细的分辨率下,额外的可解释词汇子结构出现,揭示了主要词类内的连贯子类。这些发现表明,句法类别无需显式编码,而可能作为预测序列学习的结果出现。据我们所知,这项工作首次将后继表示系统应用于自然语言,并在强化学习、语言学和认知神经科学之间建立了概念桥梁。

英文摘要

Language models are typically trained to predict the next token in a sequence. Here, we explore an alternative predictive principle from reinforcement learning: Successor Representations (SRs), which model the expected discounted distribution of future states rather than the immediate next state. We transfer this framework to natural language and train neural networks to predict future word distributions across multiple temporal horizons, thereby learning representations of long-range transition structure. We train a deep residual neural network on WikiText-103 (103 million tokens; 20,000-word vocabulary) and optimize successor representations as probability distributions using KL divergence. Without explicit linguistic supervision, structured language representations emerge spontaneously. After training, the learned space develops a clear geometric organization with respect to part-of-speech (POS) categories: nouns, verbs, and adjectives become separable and recoverable through unsupervised clustering. This organization depends systematically on predictive horizon, with short horizons producing the strongest syntactic structure and longer horizons increasingly integrating broader contextual and semantic information. At finer resolutions, additional interpretable lexical substructure emerges, revealing coherent subclasses within major word categories. These findings suggest that syntactic categories need not be explicitly encoded but may arise as a consequence of predictive sequence learning. To our knowledge, this work provides the first systematic application of successor representations to natural language and establishes a conceptual bridge between reinforcement learning, linguistics, and cognitive neuroscience.

2605.24584 2026-05-26 cs.LG cs.AI

LAPLEX: The FFT of Learnable Laplace Kernels

LAPLEX: 可学习拉普拉斯核的FFT

Łukasz Struski, Hanna Blazhko, Piotr Kubaty, Jacek Tabor

AI总结 提出LAPLEX算子,通过可学习坐标锚点隐式定义满秩稠密矩阵,实现FFT规模的可训练矩阵-向量运算,分离表达性与存储成本。

详情
AI中文摘要

深度学习中的快速线性代数通常面临一个选择:固定几何和精确计算(如傅里叶变换),或者通过稠密参数、随机特征或低秩近似实现自适应几何。为了超越这种权衡,我们引入了LAPLEX,一类精确的、可训练的(相位)拉普拉斯核算子。LAPLEX层通常是一个满秩稠密矩阵,由可学习的坐标锚点隐式定义,具有类似FFT的缩放特性。因此,它支持在现代GPU上对高达$10^9$维的向量进行可训练的矩阵-向量运算。作为神经网络层,它产生紧凑的投影和分类头,可解释为软性的、可训练的路由模型。同样的原语也可作为高效的Gram算子,实现对展平图像(维度$3 \cdot 10^6$)的高维协方差建模,在保留可见空间结构的同时不施加卷积偏差。这些应用反映了一个单一原则:无需存储稠密矩阵即可学习稠密几何,从而在普通稠密层无法企及的领域中实现数据自适应的全局交互。在这个意义上,LAPLEX将表达性与存储成本分离:它表现得像一个稠密可训练矩阵,但通过一个小的结构化参数集表示和应用。

英文摘要

Fast linear algebra in deep learning usually comes with a choice: fixed geometry and exact computation, as in the Fourier transform, or adaptive geometry paid for by dense parameters, random features, or low-rank surrogates. To move beyond this trade-off, we introduce LAPLEX, a class of exact, trainable (phased) Laplace-kernel operators. A LAPLEX layer is a typically full-rank dense matrix, implicitly defined by learnable coordinate anchors, with FFT-like scaling. Consequently, it supports trainable matrix--vector operations at vector dimensions up to $10^9$ on modern GPUs. As a neural layer, it yields compact projections and classification heads interpretable as soft, trainable routing models. The same primitive also serves as an efficient Gram operator, enabling high-dimensional covariance models on flattened images of dimension $3 \cdot 10^6$ that preserve visible spatial structure without imposing convolutional bias. These applications reflect a single principle: dense geometry can be learned without storing a dense matrix, which enables data-adaptive global interactions in regimes where ordinary dense layers are out of reach. In this sense, LAPLEX separates expressivity from storage cost: it behaves like a dense trainable matrix, but is represented and applied through a small structured set of parameters.

2605.24579 2026-05-26 cs.CL

WhenLoss: Diagnosing Write and Retrieval Bottlenecks in Long-Context Memory Systems

WhenLoss: 诊断长上下文记忆系统中的写入与检索瓶颈

Jiangnan Yu, Kisson Songqi Lin, Jilong Wu

AI总结 提出四条件诊断协议发现写入阶段是长上下文记忆系统的主要瓶颈,并基于此提出预期预测压缩(EPC)方法,在写入时利用LLM预测未来问题并保留关键证据,显著提升系统性能。

Comments 14 pages, 7 figures, 9 tables

详情
AI中文摘要

长上下文记忆系统在固定预算下常常失败,但端到端评估无法揭示证据是在压缩过程中被丢弃还是被保留但从未被检索。我们引入了一个四条件诊断协议,在截断完整上下文(TFC)、证据预言(OE)、完整存储记忆(CSM)和检索记忆(RM)条件下评估固定阅读器。在此固定预算的LongMemEval设置下,大多数测试基线的写入侧差距超过检索侧差距,其中六个基线中的四个在我们的默认诊断裕度下稳健地表现为写入主导。受此诊断启发,我们提出预期预测压缩(EPC),该方法将关键决策——保留哪些信息——移至写入时间,通过使用LLM预测未来可能的问题并在令牌预算下保留最少的支持证据,同时在问题时间保持检索不变。在所有500个LongMemEval问题中,使用三个阅读器(GPT-5.2、Claude Sonnet 4、Gemini 2.5 Pro),EPC在所有系统中取得了最高的CSM分数(0.49,而最强基线Summary (LLM)为0.44),将Delta_write降至0.04,同时Delta_retr与其他基于LLM的系统相当。这些结果表明,在此基准和评估设置下,改进写入阶段保留的内容是测试系统性能提升的关键途径。

英文摘要

Long-context memory systems often fail under fixed budgets, but end-to-end evaluation does not reveal whether evidence was discarded during compression or preserved but never retrieved. We introduce a four-condition diagnostic protocol that evaluates a fixed reader under truncated full context (TFC), oracle evidence (OE), complete stored memory (CSM), and retrieved memory (RM). Under this fixed-budget LongMemEval setup, write-side gaps exceed retrieval-side gaps for most tested baselines, with four of six baselines robustly write-dominant under our default diagnosis margin. Motivated by this diagnosis, we propose Expected Predictive Compression (EPC), which moves the key decision--what information to retain--to write time by using an LLM to anticipate likely future questions and preserve the minimal supporting evidence under the token budget, while leaving retrieval unchanged at question time. Across all 500 LongMemEval questions with three readers (GPT-5.2, Claude Sonnet 4, Gemini 2.5 Pro), EPC achieves the highest CSM scores among all systems (0.49 vs. 0.44 for Summary (LLM), the strongest baseline), reducing Delta_write to 0.04 while leaving Delta_retr comparable to other LLM-based systems. These results suggest that, on this benchmark and evaluation setup, improving what the write stage preserves is a key avenue for performance gains in the tested systems.

2605.24578 2026-05-26 cs.CV

World Models as Group Actions

世界模型作为群作用

Zijie Wang, Wei Zhang, Weiming Zhang, Fanqi Zhang, Xiao Tan, Yipeng Qin, Guanbin Li

AI总结 本文提出将动作条件世界建模形式化为状态空间上的群作用,通过潜在空间正则化强制执行恒等、逆和组合一致性,并引入群作用一致性(GAC)和群作用鲁棒性(GAR)指标来评估结构正确性和展开稳定性。

Comments Under review

详情
AI中文摘要

视频世界模型已实现强大的视觉真实性,但这并不确保其动态真正由动作控制。本文认为,动作忠实性应通过动作的组合结构来理解,在许多具身设置中,这种结构遵循群结构(例如,导航中的SE(2))。基于这一见解,我们将动作条件世界建模形式化为在状态空间上实现群作用,为评估超越视觉质量的动态提供了原则性标准。为了实施这一框架,我们提出了一种统一方法,通过合成监督的潜在空间正则化强制执行恒等、逆和组合一致性,避免额外数据收集。我们进一步引入了两个指标:群作用一致性(GAC)和群作用鲁棒性(GAR),以评估结构正确性和展开稳定性。大量实验结果表明,我们的方法在不降低感知质量的情况下,一致地改进了最先进视频世界模型中的GAC和GAR。

英文摘要

Video world models have achieved strong visual realism, but this does not ensure that their dynamics are truly governed by actions. In this work, we argue that action faithfulness should be understood through the compositional structure of actions, which in many embodied settings follows a group structure (e.g., SE(2) for navigation). Based on this insight, we formalize action-conditioned world modeling as realizing a group action on the state space, providing a principled criterion for evaluating dynamics beyond visual quality. To operationalize this framework, we propose a unified approach that enforces identity, inverse, and composition consistency via latent-space regularization with synthesized supervision, avoiding additional data collection. We further introduce two metrics: Group-Action Consistency (GAC) and Group-Action Robustness (GAR), to evaluate structural correctness and rollout stability. Extensive experimental results show that our method consistently improves both GAC and GAR in state-of-the-art video world models without degrading perceptual quality.

2605.24577 2026-05-26 cs.LG cs.AI cs.CL

Polymorphism Is Rotation: Operational Mechanistic Interpretability from a Two-Layer Transformer to Pythia-70m

多态性即旋转:从两层Transformer到Pythia-70m的操作性机械可解释性

Jordan F. McCann

AI总结 本文发现独立训练的Transformer在残差流基上通过均匀随机旋转相互关联,并利用正交Procrustes拟合实现特征字典和转向向量在模型间的迁移,无需重新训练。

Comments 26 pages, 4 figures, 40 references. Pre-registered four-bar framework; all numerical claims reproducible

详情
AI中文摘要

独立训练的Transformer在残差流基上计算相同的函数,这些基通过$\mathrm{SO}(d_{\mathrm{model}})$上的均匀随机旋转相互关联。我们将这种现象称为多态性:相同的函数,但内部坐标互不可解。每对模型之间的一次矩阵乘法即可消除这种多态性:在单批激活上进行正交Procrustes拟合,即可在独立训练的模型之间迁移稀疏自编码器特征字典和转向向量,无需重新训练。 该现象对标准SAE通用性度量不可见。解码器列余弦相似度在不同种子间匹配度达98%,即SAE通用性的头条数字,而一个种子训练的SAE重构另一个种子的激活时,解释方差为负,比预测常数均值更差。解码器列对齐,但编码器从旋转后的框架读取。单个Procrustes旋转$R$可在每个内部位置将重构恢复至种子内上限的0.025 EV以内。 $R$服从Haar分布:$\|R - I\|_F$与随机正交预测$\sqrt{2 d_{\mathrm{model}}}$在$d_{\mathrm{model}} = 512$时匹配至0.1%,且$R$的特征值谱与Haar $\mathrm{SO}(d_{\mathrm{model}})$的Kolmogorov-Smirnov检验在合并和逐对情况下均返回$p \approx 1.000$。均值差转向向量通过与$R$的不变子空间对齐在三种机制下迁移:当被共享输出权重固定时清晰,与旋转子空间重叠时部分,否则反转。在无共享输入/输出(Pythia)时,所有三种情况均坍缩为普遍反转。同一旋转解释适用于单次运行中的不同训练检查点。 在104k参数的Dyck-3 Transformer和九个独立训练的Pythia-70m种子(基于The Pile数据集)上,通过预注册的四柱操作框架进行验证。前沿规模(10B+)的复现仍有待研究。

英文摘要

Independently trained transformers compute the same function in residual-stream bases that differ by a uniform random rotation on $\mathrm{SO}(d_{\mathrm{model}})$. We call this phenomenon polymorphism: same function, mutually unintelligible interior coordinates. One matrix multiplication per model pair removes it: an orthogonal Procrustes fit on a single batch of activations transfers sparse-autoencoder feature dictionaries and steering vectors between independently trained models, with no retraining. The phenomenon is invisible to the standard SAE universality metric. Decoder-column cosine similarity matches across seeds at 98%, the SAE-universality headline number, while an SAE trained on one seed reconstructs another seed's activations at negative explained variance, worse than predicting the constant mean. The decoder columns align; the encoder reads from a rotated frame. A single Procrustes rotation $R$ restores reconstruction to within 0.025 EV of the within-seed ceiling at every internal site. $R$ is Haar-distributed: $\|R - I\|_F$ matches the random-orthogonal prediction $\sqrt{2 d_{\mathrm{model}}}$ to 0.1% at $d_{\mathrm{model}} = 512$, and a Kolmogorov-Smirnov test of $R$'s eigenvalue spectrum against Haar $\mathrm{SO}(d_{\mathrm{model}})$ returns $p \approx 1.000$ pooled and per-pair. Diff-of-means steering vectors transfer in three regimes by alignment with $R$'s invariant subspace: clean when pinned by shared output weights, partial when overlapping the rotated subspace, inverted otherwise. With no shared I/O (Pythia), all three collapse to universally inverted. The same rotation account holds across training checkpoints within a single run. Validated on a 104k-parameter Dyck-3 transformer and nine independently-trained Pythia-70m seeds on The Pile, via a pre-registered four-bar operational framework. Frontier-scale (10B+) replication remains open.

2605.24576 2026-05-26 cs.AI

Associations between echocardiographic traits and AI-ECG predictions of heart failure

超声心动图特征与AI-ECG心力衰竭预测之间的关联

Elias Stenhede, Eivind Bjørkan Orstad, Torbjørn Omland, Henrik Schirmer, Arian Ranjbar

AI总结 本研究通过回顾性分析8147例患者数据,发现AI-ECG预测的心力衰竭风险主要与整体纵向应变等收缩功能指标相关,且在射血分数保留的患者中也能捕捉舒张功能异常。

详情
AI中文摘要

人工智能心电图(AI-ECG)可以检测心力衰竭(HF),包括左心室射血分数(LVEF)未捕获的疾病,但模型预测背后的心脏表型仍不清楚。因此,我们研究了AI-ECG预测的HF风险是否与已确立的心肌功能障碍、重构和充盈压的超声心动图测量指标一致。我们回顾性分析了2023年1月1日至2025年6月1日期间在阿克什胡斯大学医院三天内同时接受心电图和超声心动图检查的8147名患者的数据。对所有心电图应用了先前验证的用于HF检测的AI-ECG模型。斯皮尔曼秩相关系数ρ量化了超声心动图参数与AI-ECG风险之间的关联。按性别和左心室射血分数(LVEF)进行了亚组分析。外部验证包括来自哥伦比亚大学欧文医学中心的36,286对心电图-超声心动图数据。整体纵向应变(GLS)显示出最强的相关性(ρ=0.57),其次是二尖瓣环平面收缩期位移(MAPSE)(ρ=-0.49)和LVEF(ρ=-0.45)。在LVEF>50%的患者中,GLS、MAPSE和舒张相关参数的相关性仍然显著。女性的左心室容积指数相关性较弱,而舒张指数在女性中的相关性比男性更强。生理学验证表明,AI-ECG的HF风险预测主要与收缩功能指标(特别是整体纵向应变)一致,同时也能捕捉LVEF保留患者的舒张相关异常。这种方法可能提高临床可解释性,并识别模型改进的机会。

英文摘要

Artificial intelligence-enabled electrocardiography (AI-ECG) can detect heart failure (HF), including disease not captured by left ventricular ejection fraction (LVEF), but the cardiac phenotypes underlying model predictions remain unclear. We therefore investigated whether AI-ECG-predicted HF risk aligns with established echocardiographic measures of myocardial dysfunction, remodelling, and filling pressures. We retrospectively analysed ECG and echocardiography data from 8147 patients who underwent both examinations within three days at Akershus University Hospital between 1 January 2023 and 1 June 2025. A previously validated AI-ECG model for HF detection was applied to all ECGs. Spearman's rank correlation $ρ$ quantified associations between echocardiographic parameters and AI-ECG risk. Subgroup analyses were performed by sex and left ventricular ejection fraction (LVEF). External validation included 36,286 ECG-echocardiography pairs from Columbia University Irving Medical Center. Global longitudinal strain (GLS) showed the strongest correlation ($ρ$=0.57), followed by mitral annular plane systolic excursion (MAPSE) ($ρ$=-0.49) and LVEF ($ρ$=-0.45). In patients with LVEF>50%, correlations remained substantial for GLS, MAPSE, and diastolic-related parameters. Volumetric left ventricular indices correlated less strongly in women, whereas diastolic indices showed stronger correlations in women than in men. Physiological validation showed that AI-ECG HF risk predictions align primarily with measures of systolic function, particularly global longitudinal strain, while also capturing diastolic-related abnormalities in patients with preserved LVEF. This approach may improve clinical interpretability and identify opportunities for model refinement.

2605.24573 2026-05-26 cs.CL

AstroMind: A High-Fidelity Benchmark for Spacecraft Behavior Reasoning Based on Large Language Models

AstroMind:基于大型语言模型的航天器行为推理高保真基准

Hao Liu, Siyuan Yang, Qinglei Hu, Dongyu Li

AI总结 针对航天器机动行为理解问题,提出基于高保真天体动力学模拟和真实观测约束的基准AstroMind,涵盖意图推断、参数估计和威胁评估三类任务,并评估多种开源模型表现。

详情
AI中文摘要

理解航天器为何机动——而不仅仅是它机动了——对于空间领域感知而言是一个日益重要的问题,因为地球轨道变得越来越拥挤和充满竞争。当前的分析流程是为检测而构建的:它们擅长发现发生了某事,但在推理其含义方面则不那么擅长。AstroMind 是一个基于物理的基准,旨在弥合这一差距。它利用高保真天体动力学模拟和真实观测约束,将其转化为三类任务中可验证的推理问题:意图推断、机动参数估计和威胁评估。每个场景都包含真实的传感噪声和不同可靠性水平的多源文本情报。评估指标同时衡量物理约束下的语义正确性和定量一致性。对一系列开源模型的基准测试显示,没有单一模型在所有维度上占优:Qwen3 (32B) 在意图推断准确性上领先;QwQ (32B) 在威胁评估上领先,并在解析项上实现了最低的中位相对误差;GPT-OSS (20B) 产生了最强的评判推理质量,并为参数估计提取了最多的标量值(241个解析项中的136个)。训练数据组成和推理风格与模型大小同等重要。结构化的推理提示在测试的8B模型中持续有帮助,对于已经能够跟踪物理约束的模型,收益更大。AstroMind 为该领域提供了一个共享测试,用于解决一个既需要正确理解物理又需要正确解读战术态势的问题——两者单独都不足够。

英文摘要

Understanding why a spacecraft maneuvers -- rather than simply that it did -- is an increasingly important problem for space domain awareness as Earth orbits grow crowded and contested. Current analysis pipelines are built for detection: they are good at picking up that something happened, less good at reasoning about what it means. AstroMind is a physics-grounded benchmark designed to close that gap. It draws on high-fidelity astrodynamics simulations and real observational constraints, converting them into verifiable reasoning problems across three task types: intent inference, maneuver parameter estimation, and threat assessment. Each scenario includes realistic sensing noise and multi-source textual intelligence at varying reliability levels. Evaluation metrics capture both semantic correctness and quantitative consistency under physical constraints. Benchmarking a suite of open-weight models shows no single model dominates every axis: Qwen3 (32B) leads on intent inference accuracy; QwQ (32B) leads on threat assessment and achieves the lowest median relative error on parsed items; GPT-OSS (20B) produces the strongest judged reasoning quality and extracts the most scalar values for parameter estimation (136 of 241 parsed items). Training data composition and reasoning style matter as much as model size. Structured reasoning prompts help consistently across tested 8B models, with larger gains for those that can already track physical constraints. AstroMind gives the field a shared test for a problem where getting the physics right and reading the tactical situation correctly are both required -- neither is sufficient on its own.

2605.24570 2026-05-26 cs.LG cs.AI cs.CV

PILOT: Policy-Informed Learned Optimization for Adaptive Deep Network Training

PILOT: 策略引导的学习优化器用于自适应深度网络训练

Sattam Altuuaim, Lama Ayash, Muhammad Mubashar, Naeemullah Khan

AI总结 提出PILOT在线优化器,通过梯度方向一致性信号动态调整动量、归一化和符号更新的组合,在FashionMNIST和CIFAR-10上实现更高准确率。

Comments 16 pages, 5 figures

详情
AI中文摘要

尽管优化在深度学习中扮演核心角色,但大多数优化器依赖于训练开始前固定函数形式的更新结构。这种静态设计限制了它们响应损失景观中变化梯度行为的能力,其中训练可能在稳定、噪声和不一致状态之间切换。本研究提出PILOT(策略引导的学习优化器),一种在线优化器,在训练过程中自适应其更新行为。PILOT不使用动量、归一化和符号更新之间的固定平衡,而是将梯度方向一致性作为局部训练稳定性的信号。基于该一致性信号调整更新规则,使优化器能够在梯度变得稳定、噪声或不一致时调整其行为。在FashionMNIST和CIFAR-10上的实验表明,PILOT在卷积设置中始终达到评估优化器中的最高准确率。在CNN架构上,PILOT在FashionMNIST上达到94.13%,在CIFAR-10上达到81.94%。在ResNet-18上,它进一步提升了性能,在FashionMNIST上达到95.71%,在CIFAR-10上达到93.42%。这些结果表明,在训练过程中学习如何调整更新结构可以在保持简单一阶优化框架的同时,提高紧凑和更深卷积模型的性能。PILOT的实现公开于https://github.com/SattamAltwaim/PILOT.git。

英文摘要

Despite the central role of optimization in deep learning, most optimizers rely on update structures whose functional form is fixed before training begins. This static design can limit their ability to respond to changing gradient behavior across the loss landscape, where training may shift between stable, noisy, and inconsistent regimes. This study proposes PILOT (Policy-Informed Learned OpTimizer), an online optimizer that adapts its update behavior during training. Rather than using a fixed balance between momentum, normalization, and sign-based updates, PILOT uses gradient-direction agreement as a signal of local training stability. Conditioning the update rule on this agreement signal allows the optimizer to adjust its behavior when gradients become stable, noisy, or inconsistent. Experiments on FashionMNIST and CIFAR-10 show that PILOT consistently achieves the highest accuracy among the evaluated optimizers across convolutional settings. On the CNN architecture, PILOT reaches 94.13% on FashionMNIST and 81.94% on CIFAR-10. On ResNet-18, it further improves performance, reaching 95.71% on FashionMNIST and 93.42% on CIFAR-10. These results suggest that learning how to adapt the update structure during training can improve performance across both compact and deeper convolutional models while preserving a simple first-order optimization framework. The implementation of PILOT is publicly available at https://github.com/SattamAltwaim/PILOT.git

2605.24566 2026-05-26 cs.CV cs.GR cs.LG

EMA: Effort Metric Attention for Anatomical Effort-Guided Human Motion Diffusion

EMA: 面向解剖学努力引导的人体运动扩散的努力度量注意力

Joshua Siy, Huakun Liu, Yutaro Hirao, Monica Perusquia-Hernandez, Hideaki Uchiyama, Kiyoshi Kiyokawa

AI总结 提出基于努力度量注意力(EMA)的强度控制框架,通过数值努力信号调节运动扩散模型,实现细粒度、区域化的运动强度控制,并验证了与LMA描述符的单调对齐。

Comments Accepted at IEEE International Conference on Automatic Face and Gesture Recognition (FG 2026)

详情
AI中文摘要

人体运动扩散模型可以从文本合成动作序列,但控制运动强度仍然具有挑战性。现有方法依赖于与努力相关的副词,这些副词模糊不清,无法捕捉诸如节奏等定量方面,通常导致动态平坦且单调。我们提出了一种基于努力度量注意力(EMA)的强度控制框架,这是一个交叉注意力模块,将扩散条件建立在数值努力信号上。受拉班动作分析(LMA)启发,该框架关注时间和重量努力因素。我们使用两个运动学指标来近似这些因素:用于节奏的峰值关节位置变化和用于运动量的集体关节位置变化。EMA实现了细粒度、区域化的控制,无需昂贵的后验优化。我们引入了两个评估任务,度量到运动的一致性和身体部位级别的努力调制,以评估数值保真度和局部控制。实验和用户研究表明,指定的努力水平、生成的运动动态和已建立的LMA描述符之间具有近乎单调的对齐。这些结果表明在实践中对努力动态进行了有效且可解释的控制。

英文摘要

Human motion diffusion models can synthesize action sequences from text, but controlling motion intensity remains challenging. Existing approaches rely on effort-related adverbs, which are ambiguous and fail to capture quantitative aspects such as pacing, often resulting in flat and monotonous dynamics. We propose an intensity-control framework based on Effort Metric Attention (EMA), a cross-attention module that conditions diffusion on numerical effort signals. Inspired by Laban Movement Analysis (LMA), the framework focuses on the Time and Weight effort factors. We approximate these factors using two kinematic metrics: peak joint positional change for pacing and collective joint positional change for motion amount. EMA enables fine-grained, region-wise control without costly post-hoc optimization. We introduce two evaluation tasks, metric-to-motion consistency and body-part-level effort modulation, to assess numerical fidelity and localized control. Experiments and a user study show near-monotonic alignment between specified effort levels, generated motion dynamics, and established LMA descriptors. These results indicate effective and interpretable control of effort dynamics in practice.

2605.24564 2026-05-26 cs.AI cs.CE cs.LG

Summoning the Oracle to Slay It: Mitigating Look-Ahead Bias in Financial Backtesting with Large Language Models

召唤神谕以屠之:利用大语言模型缓解金融回测中的前瞻偏差

Weixian Waylon Li, Mengyu Wang, Tiejun Ma

AI总结 提出FinCAD方法,通过对抗性偏差发现和实体日期自适应规则,在不重新训练的情况下抑制大语言模型对历史结果的记忆,从而缓解金融回测中的参数化前瞻偏差。

详情
AI中文摘要

在历史金融数据上回测大语言模型(LLMs)是不可靠的,因为预训练在事件发生后截断。一个在2024年训练的LLM已经“知道”2018-2020年股票的走势。我们将这种失败命名为参数化前瞻偏差,并提出FinCAD,一种上下文感知解码的推理时适配方法,无需重新训练即可抑制LLM对历史结果的记忆。FinCAD结合了一个对抗性偏差发现流程,该流程学习一个模型特定的记忆激活先验提示,以及一个实体和日期自适应规则,该规则将CAD强度按(实体,日期)记忆程度缩放,使得惩罚在记忆的样本内日期触发,并在样本外衰减至零。在五个7-14B LLM和五只大盘股上,FinCAD在记忆日期上将样本内回测收益削减高达-67.1%,同时将2025年样本外收益保持在$8K以内,夏普比率在基线的0.10以内,并保持通用推理能力在1.7分以内。在十一个模型的排行榜上,它将样本内/样本外Spearman相关性从+0.779提升至+0.846,恢复了真正预测样本外表现的排名。

英文摘要

Backtesting large language models (LLMs) on historical financial data is unreliable because pre-training cuts off after the events happened. An LLM trained in 2024 already "knows" which way 2018-2020 stocks moved. We name this failure parametric look-ahead bias and propose FinCAD, an inference-time adaptation of Context-Aware Decoding that suppresses an LLM's memory of historical outcomes without retraining. FinCAD pairs an adversarial bias-discovery pipeline that learns a model-specific memory-activating prior prompt with an entity- and date-adaptive rule that scales the CAD strength to per-(entity, date) memorisation, so the penalty fires on memorised in-sample dates and decays to zero out-of-sample. Across five 7-14B LLMs and five mega-cap equities, FinCAD cuts in-sample backtest returns by up to -67.1% on memorised dates while leaving 2025 out-of-sample returns within $8K and Sharpe within 0.10 of baseline, and preserves general-purpose reasoning within 1.7 pts. On an eleven-model leaderboard, it raises the in-sample / out-of-sample Spearman correlation from +0.779 to +0.846, recovering rankings that genuinely predict out-of-sample performance.

2605.24562 2026-05-26 cs.CV cs.AI

PEDESTRIANQA: A Benchmark for Vision-Language Models on Pedestrian Intention and Trajectory Prediction

PEDESTRIANQA: 面向行人意图与轨迹预测的视觉-语言模型基准

Naman Mishra, Shankar Gangisetty, C. V. Jawahar

AI总结 提出大规模视频数据集PedestrianQA,将行人意图和轨迹预测转化为带结构化理由的问答任务,通过微调视觉-语言模型显著提升预测准确性与可解释性。

详情
AI中文摘要

行人意图和轨迹预测对于自动驾驶系统的安全部署至关重要,直接影响复杂交通环境中的导航决策。近期大型视觉-语言模型的进展通过结合高容量视觉理解与灵活的自然语言推理,为这些任务提供了强大的新范式。本文中,我们引入PedestrianQA,这是一个大规模视频数据集,将行人意图和轨迹预测公式化为带有结构化理由的问答任务。PedestrianQA以自然语言表达丰富标注的行人序列,使视觉-语言模型能够从视觉动态、上下文线索和交通智能体间的交互中学习,同时生成其预测的简洁解释,无需为每个任务定制专门的架构。在PIE、JAAD、TITAN和IDD-PeD上的实证评估表明,在PedestrianQA上微调最先进的视觉-语言模型显著提高了意图分类、轨迹预测准确性以及解释性理由的质量,展示了视觉-语言模型作为安全关键行人行为建模的统一且可解释框架的强大潜力。

英文摘要

Pedestrian intention and trajectory prediction are critical for the safe deployment of autonomous driving systems, directly influencing navigation decisions in complex traffic environments. Recent advances in large vision-language models offer a powerful new paradigm for these tasks by combining high-capacity visual understanding with flexible natural language reasoning. In this work, we introduce PedestrianQA, a large-scale video-based dataset that formulates pedestrian intention and trajectory prediction as question-answering tasks augmented with structured rationales. PedestrianQA expresses richly annotated pedestrian sequences, in natural language, enabling VLMs to learn from visual dynamics, contextual cues, and interactions among traffic agents while generating concise explanations of their predictions without needing specialized architectures tailored for each task. Empirical evaluations across PIE, JAAD, TITAN, and IDD-PeD show that finetuning state-of-the-art VLMs on PedestrianQA significantly improves intention classification, trajectory forecasting accuracy, and the quality of explanatory rationales, demonstrating the strong potential of VLMs as a unified and explainable framework for safety-critical pedestrian behavior modeling.

2605.24558 2026-05-26 cs.LG

Position: AI for Science Should Treat Measurement-to-Dataset Pipelines as Inference Components

立场:科学人工智能应将测量到数据集的处理流程视为推理组件

Ling Zhan, Xiaoyao Yu, Tao Jia

AI总结 本文主张科学人工智能中的测量到数据集流程应被视为推理组件,并揭示了将其输出视为固定数据导致的三个失败模式,通过大规模神经科学实证验证了问题的严重性,呼吁建立可计算的观测框架。

Comments 23 pages, 5 figures, Proceedings of the 43 rd International Conference on Machine Learning, Seoul, South Korea. PMLR 306, 2026

详情
AI中文摘要

科学人工智能(AI4Science)工作流通常将发布的数据集视为底层系统的固定接口。然而,在依赖间接观测的领域中,学习器观察到的是由多阶段测量、重建和预处理流程产生的衍生表示。我们认为这些测量到数据集的处理流程是推理组件:将其输出视为“给定数据”会冻结观测模型并掩盖可行流程选择的不确定性。我们识别出这种“冻结透镜”导致的三个失败模式:(C1)隐藏假设空间,即发布的数据集未指定流程配置或其有效性条件;(C2)未经认证的可迁移性,即流程可能被记录但其有效性范围未经测试,因此分布偏移下的失败无法判定;(C3)无约束的多样性,即存在许多可辩护的流程且分散性是真实的,但未传播到不确定性感知的证据中。我们通过大规模神经科学实证审计对这些主张进行压力测试,发现在跨数据集稳定性标准下存活率约为0.0004%。我们呼吁AI4Science社区通过特定领域的可计算观测框架使流程成为可计算的推理对象。这一转变能够量化流程的充分性和稳定性,将隐式的实现选择转化为可审计、可复现和累积的科学证据。

英文摘要

AI for Science (AI4Science) workflows often treat the released dataset as a fixed interface to the underlying system. However, in domains relying on \emph{indirect observation}, the learner observes a derivative representation produced by multi-stage measurement, reconstruction, and preprocessing pipelines. \textbf{We argue that these measurement-to-dataset pipelines are inference components: treating their outputs as ``given data'' freezes an observation model and obscures uncertainty over feasible pipeline choices.} We identify three failure modes arising from this ``frozen lens'': \textbf{(C1) hidden hypothesis space}, where the released dataset does not specify the pipeline configuration or its validity conditions; \textbf{(C2) uncertified transportability}, where a pipeline may be documented but its regime of validity is untested, so failures under distribution shift cannot be adjudicated; \textbf{(C3) ungoverned multiplicity}, where many defensible pipelines exist and dispersion is real but not propagated into uncertainty-aware evidence. We stress-test these claims with a large-scale neuroscience empirical audit, finding a survival rate of $\approx 0.0004\%$ under a cross-dataset stability criterion. We call on the AI4Science community to make pipelines \emph{computable} inference objects via domain-specific Computable Observation Frameworks. This shift enables quantifying pipeline adequacy and stability, converting implicit implementation choices into auditable, reproducible, and cumulative scientific evidence.

2605.24553 2026-05-26 cs.CV

IQA-Spider: Unifying Multi-Granularity Image Quality Assessment with Reasoning, Grounding and Referring

IQA-Spider:统一多粒度图像质量评估与推理、定位和指代

Xinge Peng, Yiting Lu, Xin Li, Zhibo Chen

AI总结 提出IQA-Spider框架,通过统一推理、定位和指代任务,实现多粒度图像质量评估,并采用两阶段设计解决现有方法仅支持部分感知维度的问题。

Comments Accepted by ICML 2026

详情
AI中文摘要

我们提出IQA-Spider,这是第一个将推理、定位和指代统一到单个基于LMM的框架中的图像质量评估(IQA)框架,用于多粒度质量理解。现有的基于LMM的IQA方法通常仅支持部分感知维度,例如质量描述和问答(即推理)或像素级定位。这一局限性主要源于缺乏(i)统一的任务和数据形式化,以及(ii)有效的多粒度学习优化范式。为解决这些局限性,我们形式化了一个严格的任务四元组,涵盖全局和局部质量描述、像素级定位以及区域级指代。基于这一形式化,我们通过可扩展的自动标注流水线构建了相应的IQA数据集,从而为统一的多粒度学习提供了坚实基础。为进一步实现统一感知,我们采用无冲突的两阶段设计,逐步将文本级多粒度理解扩展到像素级定位:(i)第一阶段使模型具备跨多个IQA任务的细粒度文本级推理能力;(ii)第二阶段引入无需训练的文本到点定位范式,通过将token logits映射到空间坐标来桥接文本语义和像素级感知。基于这些努力,我们实现了具有统一多粒度可解释图像质量评估的IQA-Spider。在多个基准上的大量实验展示了强大的性能,验证了所提出形式化和框架的有效性与通用性。

英文摘要

We present IQA-Spider, the first image quality assessment (IQA) framework that unifies reasoning, grounding, and referring into a single LMM-based framework for multi-granularity quality understanding. Existing LMM-based IQA methods typically support only partial perception dimensions, such as quality description and question answering~(\textit{i.e.}, reasoning) or pixel-level grounding. This limitation largely stems from the absence of (i) a unified task and data formulation and (ii) effective optimization paradigms for multi-granularity learning. To address these limitations, we formulate a rigorous four-task paradigm covering global and local quality description, pixel-level grounding, and region-level referring. Based on this formulation, we construct a corresponding IQA dataset with a scalable and automatic annotation pipeline, thereby providing a solid foundation for unified multi-granularity learning. To further enable unified perception, we adopt a conflict-free two-stage design that progressively extends text-level multi-granularity understanding to pixel-level grounding: (i) the first stage equips the model with fine-grained text-level reasoning across multiple IQA tasks, and (ii) the second stage introduces a training-free text-to-point grounding paradigm, which bridges textual semantics and pixel-level perception by mapping token logits to spatial coordinates. Based on these efforts, we achieve IQA-Spider with unified multi-granularity explainable image quality assessment. Extensive experiments across multiple benchmarks demonstrate strong performance, validating the effectiveness and versatility of the proposed formulation and framework.

2605.24550 2026-05-26 cs.AI cs.CL cs.LG

Jailbreak to Protect: Buffering and Reinforcing via Temporary Jailbreaking for Safe Fine-Tuning in Large Language Models

越狱以保护:通过临时越狱进行缓冲和强化以实现大型语言模型的安全微调

Seokil Ham, Jaehyuk Jang, Wonjun Lee, Changick Kim

AI总结 针对微调即服务中安全对齐被有害微调攻击削弱的问题,提出一种基于梯度分析的缓冲与强化框架,通过临时越狱适配器减少有害更新并利用QR分解合并强化安全,实现无需额外安全数据的高效防御。

Comments ICML 2026 Spotlight

详情
AI中文摘要

微调即服务(FaaS)使得大型语言模型(LLMs)的个性化成为可能,但它在有害微调攻击下会削弱安全对齐。最近的研究表明,在微调期间激活有害行为模块可以防止模型学习不良行为,但其机制尚不清楚。在本文中,我们重新审视临时越狱作为对抗有害微调的一种防御手段,并提供了梯度层面的分析,表明它能够饱和安全退化梯度,同时保留良性任务相关梯度。基于这一见解,我们提出了一种缓冲与强化微调框架,该框架在用户微调期间缓冲有害更新,并在适应后强化安全。具体来说,BufferLoRA作为一个可移除的适配器,在用户微调期间诱导临时越狱以减少有害更新。适应后,通过基于QR分解的合并,将经过训练的ReinforceLoRA(用于在临时越狱状态下恢复拒绝行为)与UserLoRA集成,以在保持用户任务性能的同时强化安全。大量实验表明,我们的框架在用户微调期间无需额外安全数据且计算成本极低的情况下,实现了卓越的安全性和实用性。

英文摘要

Fine-tuning-as-a-Service (FaaS) enables personalization of large language models (LLMs), but it can weaken safety-alignment under harmful fine-tuning attacks. Recent work has shown that activating harmful-behavior modules during fine-tuning can prevent models from learning undesired behaviors, but its mechanism remains unclear. In this paper, we revisit temporary jailbreaking as a defense against harmful fine-tuning and provide a gradient-level analysis showing that it saturates safety-degrading gradients while preserving benign task-relevant gradients. Based on this insight, we propose a Buffer-and-Reinforce fine-tuning framework that buffers harmful updates during user fine-tuning and reinforces safety after adaptation. Specifically, BufferLoRA induces temporary jailbreaking as a removable adapter to reduce harmful updates during user fine-tuning. After adaptation, ReinforceLoRA, trained to recover refusal behavior under the temporarily jailbroken state, is integrated with UserLoRA via QR decomposition-based merging to reinforce safety while preserving user-task performance. Extensive experiments show that our framework achieves superior safety and utility with no additional safety data during user fine-tuning and minimal computational cost.

2605.24549 2026-05-26 cs.AI

PALoRA: Projection-Adaptive LoRA for Preserving Reasoning in Large Language Models

PALoRA: 投影自适应LoRA以保持大型语言模型的推理能力

Mustafa Hayri Bilgin, Mariam Barry, Albert Bifet, Azzedine Idir Ait Said, Soumya Banerjee

AI总结 提出PALoRA框架,通过奇异值微调(SVF)识别推理关键成分,并在正交约束下使用LoRA注入知识,以在保持推理能力的同时高效更新事实知识。

详情
AI中文摘要

高效地用新的或演化的事实知识更新大型语言模型(LLMs)仍然是一个核心挑战,因为即使是参数高效的适应也可能侵蚀先前获得的推理能力。这种紧张关系反映了可塑性-稳定性困境:模型必须吸收新知识,同时保留技能关键的表征。在这项工作中,我们通过多层感知器权重矩阵的谱结构研究这种权衡。我们在理论和实验上都表明,推理所需的信息不仅局限于主导奇异方向,而是分布在奇异谱上。受此观察启发,我们引入了PALoRA,一个用于减少干扰的知识注入的两阶段框架。PALoRA首先在推理数据集上训练一个奇异值微调(SVF)专家,并使用其学习的奇异缩放向量作为冻结的几何探针,以识别对目标技能关键的成分。然后,它在结构正交约束下使用低秩适应(LoRA)执行事实知识注入,确保更新避免已识别的技能相关子空间。在Llama 3.1 8B和Mistral 7B上,以及在数学、编码和科学推理基准测试中,PALoRA平均保留了SVF专家95%的推理性能,同时保持了竞争性的事实召回。与先前的谱参数高效微调(PEFT)方法相比,它持续提高了技能保留,同时增加了不到0.006%的参数开销。

英文摘要

Efficiently updating Large Language Models (LLMs) with new or evolving factual knowledge remains a central challenge, as even parameter-efficient adaptation can erode previously acquired reasoning abilities. This tension reflects a plasticity-stability dilemma: models must incorporate new knowledge while preserving skill-critical representations. In this work, we study this trade-off through the spectral structure of multilayer perceptron weight matrices. We show, both theoretically and empirically, that information essential for reasoning is not localized only in dominant singular directions, but is instead distributed across the singular spectrum. Motivated by this observation, we introduce PALoRA, a two-stage framework for knowledge injection with reduced interference. PALoRA first trains a Singular Value Fine-Tuning (SVF) expert on a reasoning dataset and uses its learned singular scaling vector as a frozen geometric probe to identify components that are critical for the target skill. It then performs factual knowledge injection with Low-Rank Adaptation (LoRA) under a structural orthogonality constraint, ensuring that updates avoid the identified skill-relevant subspace. Across Llama 3.1 8B and Mistral 7B, and across mathematical, coding, and scientific reasoning benchmarks, PALoRA preserves on average 95% of the SVF expert's reasoning performance while maintaining competitive factual recall. It consistently improves skill retention over prior spectral Parameter-Efficient Fine-Tuning (PEFT) methods while adding less than 0.006% parameter overhead.

2605.24548 2026-05-26 cs.LG math.PR

Deep ZakaiJ: Structured Filtering for Jump-Diffusion Time Series Forecasting

Deep ZakaiJ:用于跳跃扩散时间序列预测的结构化滤波

Yan Leng, Thibaut Mastrolia, Hao Wang

AI总结 提出Deep ZakaiJ模型,将Zakai非线性滤波方程嵌入神经编码器-解码器架构,通过Strang分裂实现隐状态信念更新,用于部分观测的跳跃扩散系统,在合成、金融和海洋数据集上改进了分布预测并保持点精度竞争力。

详情
AI中文摘要

由未观测隐状态驱动的时间序列经常表现出突然的跳跃不连续性,其时间和幅度无法仅从观测历史预测。经典跳跃扩散模型提供了严谨的数学框架,但假设刚性参数形式,而最近的神经跳跃模型在完全观测轨迹上操作,不推断控制动力学的隐状态。我们提出 extit{Deep ZakaiJ},一种用于部分观测跳跃扩散系统的隐状态模型,将Zakai非线性滤波方程嵌入神经编码器-解码器架构。编码器通过Strang分裂递归更新隐状态的信念,分为三个可解释的子步骤:先验传播、扩散创新和跳跃创新,产生精确滤波演化的可微一阶精确近似。解码器是一个结构化跳跃扩散模型,明确以滤波信念为条件,保持连续动力学和不连续冲击之间的分离。在合成、金融和海洋数据集上, extit{Deep ZakaiJ}改善了分布预测,同时保持点精度竞争力,实现了校准的预测区间,并在合成和定性案例研究中恢复了可解释的隐结构。

英文摘要

Time series driven by unobserved latent states frequently exhibit abrupt jump discontinuities whose timing and magnitude cannot be predicted from observed history alone. Classical jump-diffusion models offer a principled mathematical framework but assume rigid parametric forms, while recent neural jump models operate on fully observed trajectories without inferring the hidden states that govern the dynamics. We propose \textit{Deep ZakaiJ}, a latent-state model for partially observed jump-diffusion systems that embeds the Zakai nonlinear filtering equation into a neural encoder--decoder architecture. The encoder recursively updates a belief over the latent state via Strang splitting into three interpretable substeps: prior propagation, diffusion innovation, and jump innovation, yielding a differentiable, first-order-accurate approximation of the exact filtering evolution. The decoder is a structured jump-diffusion model explicitly conditioned on the filtered belief, preserving the separation between continuous dynamics and discontinuous shocks. On synthetic, financial, and oceanographic datasets, \textit{Deep ZakaiJ} improves distributional forecasts while remaining competitive in point accuracy, achieving calibrated predictive intervals and recovering interpretable latent structure in synthetic and qualitative case studies.

2605.24547 2026-05-26 cs.LG

RL with Learnable Textual Feedback: A Bilevel Approach

基于可学习文本反馈的强化学习:一种双层方法

Utsav Singh, Sidhaarth Sredharan, Souradip Chakraborty, Amrit Singh Bedi

AI总结 针对稀疏奖励导致样本效率低的问题,提出一种双层优化框架Bi-NAC,联合训练评论家生成可改善策略的文本反馈和演员利用该反馈,在MATH-500等任务上提升了样本和参数效率。

详情
AI中文摘要

具有可验证奖励的强化学习可以改进LLM的推理能力,但当终端奖励稀疏时,学习仍然样本效率低下。这推动了关于文本反馈强化学习的一系列工作,其中评论家模型生成自然语言反馈来指导推理模型(演员),用更丰富的学习信号增强标量奖励。然而,现有方法通常将反馈视为固定的或辅助的,这忽略了关键性质:反馈不仅应正确,而且应在上下文中提供时改进策略(演员模型)。这激发了用于强化学习的可学习文本反馈范式。然而,反馈的可学习性和有用性取决于策略从中学习的能力,使得具有可学习反馈的强化学习本质上是一个双层问题。我们将这种耦合形式化为Stackelberg双层规划,并推导出双层自然语言演员-评论家(Bi-NAC),它联合训练评论家生成改善奖励的反馈和演员利用该反馈。在MATH-500、MBPP和GPQA上,Bi-NAC在样本和参数效率上优于强化学习和固定评论家基线:我们的2B模型优于3B GRPO基线,在MATH-500上达到46.6%对比41.4%,而我们的6B模型超过7B GRPO基线,在GPQA上达到49.3%对比43.6%。

英文摘要

Reinforcement learning with verifiable rewards can improve LLM reasoning, but learning remains sample-inefficient when terminal rewards are sparse. This has motivated a growing line of work on RL with textual feedback, where a critic model generates natural language feedback to guide a reasoning model (the actor), augmenting scalar rewards with richer learning signals. However, existing methods typically treat feedback as fixed or auxiliary, which misses a key property: feedback should not merely be correct, but should improve the policy (actor model) when provided in context. This motivates a paradigm of learnable textual feedback for RL. Yet the learnability and usefulness of feedback depend on the policy's ability to learn from it, making RL with learnable feedback an inherently bilevel problem. We formalize this coupling as a Stackelberg bilevel program and derive Bilevel Natural Language Actor-Critic (Bi-NAC), which jointly trains a critic to generate reward-improving feedback and an actor to exploit it. Across MATH-500, MBPP, and GPQA, Bi-NAC improves sample and parameter efficiency over RL and fixed-critic baselines: our 2B model outperforms the 3B GRPO baseline, achieving 46.6% versus 41.4% on MATH-500, while our 6B model surpasses the 7B GRPO baseline, achieving 49.3% versus 43.6% on GPQA.

2605.24546 2026-05-26 cs.AI cs.IR

Beyond Control-Flow: Integrating the Resource Perspective into Multi-Collaborative Process Modeling from Text

超越控制流:将资源视角融入基于文本的多协作流程建模

Anton Antonov, Humam Kourani, Alessandro Berti, Gyunam Park

AI总结 提出一种资源感知的生成流程,从自然语言描述中自动生成包含组织(泳池)和角色(泳道)的BPMN 2.0协作图,同时保持控制流质量并增加少量运行时开销。

Comments Submitted to EDOC 2026, under review

详情
AI中文摘要

流程建模是业务流程管理(BPM)的一个子领域,专注于将流程工件转化为正式模型。传统上,这项任务需要大量的人工输入以及在BPM符号和特定业务上下文方面的领域专业知识。虽然大型语言模型(LLMs)现在可以自动化大部分手工工作,但当前的文本到模型方法主要关注控制流视角——排序活动,而不考虑流程的协作方面。在本文中,我们介绍了一种资源感知的生成流程,从自然语言描述中生成正式的BPMN 2.0协作图。我们不是仅仅提示LLM生成原始XML,而是描述了一种紧凑、可执行的中间语言,其中包含强制性的资源细节,定义了组织(泳池)和角色(泳道)。跨组织依赖关系通过标准形式符号——消息事件——来实现,而正交布局例程自动处理泳池和泳道内元素的空间排列。在十个业务流程和九个LLM上的实验表明,该方法在保持控制流质量的同时,实现了强大的资源发现,并且仅增加了边际运行时开销。这种方法将生成式建模推向更全面、多协作的业务运营表示。

英文摘要

Process modeling is a sub-domain of Business Process Management (BPM) focused on the translation of process artifacts into formal models. This task traditionally requires extensive human input and domain expertise in both BPM notations and the specific business context. While Large Language Models (LLMs) can now automate much of this manual work, current text-to-model approaches focus predominantly on the control-flow perspective-ordering activities without considering the collaborative aspect of the processes. In this paper, we introduce a resource-aware generation pipeline that produces formal BPMN 2.0 collaboration diagrams from natural-language descriptions. Rather than solely prompting an LLM for raw XML, we describe a compact, executable intermediate language with mandatory resource details defining both the organization (pool) and the role (lane). Cross-organization dependencies are materialized using the standard formal notation for such interactions-message events-while an orthogonal layout routine automatically handles the spatial arrangement of elements within pools and lanes. Experiments on ten business processes with nine LLMs show strong resource discovery while preserving control-flow quality and adding only marginal runtime overhead. This approach moves generative modeling toward a more comprehensive, multi-collaborative representation of business operations.

2605.24545 2026-05-26 cs.LG cs.AI

Rethinking Federated Unlearning via the Lens of Memorization

通过记忆视角重新思考联邦遗忘学习

Jiaheng Wei, Yanjun Zhang, He Zhang, Leo Yu Zhang, Chao Chen, Kok-Leong Ong, Jun Zhang, Yang Xiang

AI总结 针对联邦学习中遗忘数据与保留数据重叠导致遗忘无效和客户端不公平的问题,提出基于分组记忆评估的联邦记忆剪枝方法,通过重置负责记忆的冗余参数实现高效遗忘。

Comments This paper has been accepted by SIGKDD 2026

详情
AI中文摘要

联邦学习越来越需要机器遗忘来遵守隐私法规。然而,现有的联邦遗忘方法可能忽略了遗忘数据与保留数据之间的重叠信息,导致遗忘无效和客户端之间的不公平。在这项工作中,我们通过记忆的视角重新审视联邦遗忘。我们认为,遗忘主要应移除归因于待遗忘数据的独特记忆信息,同时保留也得到剩余数据支持的重叠模式。具体地,我们提出了分组记忆评估,一种示例级度量,将记忆知识与重叠知识分离。基于该度量,我们引入了联邦记忆剪枝(FedMemPrune),一种基于剪枝的遗忘方法,重置负责记忆的冗余参数。大量实验表明,FedMemPrune 与基于重训练的遗忘基线紧密匹配,同时比现有联邦遗忘算法更有效地消除记忆,在保持保留知识效用的情况下实现了强大的遗忘性能。

英文摘要

Federated learning (FL) increasingly needs machine unlearning to comply with privacy regulations. However, existing federated unlearning approaches may overlook the overlapping information between the unlearning and remaining data, leading to ineffective unlearning and unfairness between clients. In this work, we revisit federated unlearning through the lens of memorization. We argue that unlearning should mainly remove the unique memorized information attributable to the data to be forgotten, while preserving overlapping patterns that are also supported by the remaining data. Specifically, we propose Grouped Memorization Evaluation, an example-level metric that separates memorized knowledge from overlapping knowledge. Building on this metric, we introduce Federated Memorization Pruning (FedMemPrune), a pruning-based unlearning approach that resets redundant parameters responsible for memorization. Extensive experiments show that FedMemPrune closely matches retraining-based unlearning baselines while more effectively eliminating memorization than existing federated unlearning algorithms, yielding strong unlearning performance without sacrificing the utility of retained knowledge.

2605.24543 2026-05-26 cs.AI cs.SY eess.SY

Emission-Aware Reinforcement Learning for Sustainable Electric Vehicle Charging and Carbon Dioxide Reduction Under Varying Renewable Penetration

面向可持续电动汽车充电与二氧化碳减排的排放感知强化学习:在不同可再生能源渗透率下

Ninglin Ou, Mohammad A. Razzaque, Iftekher Islam Shovon, Shafkat Khan Siam, Shafiuzzaman K Khadem, Krishnendu Guha, Mayeen U Khandaker, Md. Noor-A-Rahim

AI总结 提出基于软演员-评论家算法的排放感知强化学习策略,通过多目标奖励函数优化电动汽车充电调度,在EV2Gym平台上实现高达87%的碳排放减少和52%的可再生能源自消纳率。

Comments Submitted the Engineering Applications of Artificial Intelligence Journal (Elsevier)

详情
AI中文摘要

电动汽车(EV)的快速增长通过非协调充电导致的峰值负荷尖峰、电压不稳定和变压器过载给配电网络带来挑战。虽然模型预测控制(MPC)和标准强化学习(RL)方法已解决这些问题,但现有方法很少将实时碳强度或波动的可再生能源(RE)可用性作为主要调度目标,留下了巨大的脱碳潜力未实现。本文提出一种基于软演员-评论家(SAC)算法的排放感知RL策略,其多目标奖励函数惩罚碳排放、削减的现场可再生能源和未满足的用户需求。该智能体在EV2Gym平台上的统一基准框架中训练,结合了表后太阳能和风能曲线、时变的EirGrid碳强度数据以及25个电动汽车供电设备(EVSE)单元上真实的工作场所EV行为。比较了九种控制策略,包括启发式方法、排放感知MPC变体和所提出的RL智能体,在五种可再生能源渗透率场景(0%-50%)下各进行十次独立运行。RL智能体在50%风能渗透率下实现了低至23.96克二氧化碳每千瓦时的碳强度,相比未控制基线减排高达87%,并优于基于外部图表的配电网络(PDN)基准。在所有场景下,变压器过载保持在7千瓦时以下,而“尽可能快”(AFAP)启发式方法高达1093千瓦时;在风能和太阳能联合供应下,可再生能源自消纳率达到52%。将碳强度预测嵌入RL状态和奖励中,使充电与低排放时段对齐,同时保持电网合规性和用户满意度。

英文摘要

The rapid growth of Electric Vehicle (EV) adoption challenges power distribution networks through peak load spikes, voltage instability, and transformer overloads from uncoordinated charging. While Model Predictive Control (MPC) and standard Reinforcement Learning (RL) methods have addressed these issues, existing approaches rarely treat real-time carbon intensity or fluctuating renewable energy (RE) availability as primary scheduling objectives, leaving substantial decarbonisation potential unrealised. This paper proposes an emission-aware RL strategy based on the Soft Actor Critic (SAC) algorithm, with a multi-objective reward that penalises carbon emissions, curtailed on-site renewables, and unmet user demand. The agent is trained within a unified benchmarking framework on the EV2Gym platform, incorporating behind-the-meter solar and wind profiles, time-varying EirGrid carbon intensity data, and realistic workplace EV behaviour across 25 Electric Vehicle Supply Equipment (EVSE) units. Nine control strategies, including heuristics, emission-aware MPC variants, and the proposed RL agent, are compared under five renewable penetration scenarios (0%-50%) over ten independent runs each. The RL agent achieves a carbon intensity as low as 23.96 grams of carbon dioxide per kilowatt-hour under 50% wind penetration, representing up to 87% emission reduction versus the uncontrolled baseline, and outperforms the external graph-based Power Distribution Network (PDN) benchmark. Transformer overload remains below 7 kWh across scenarios, against up to 1093 kWh for the As Fast As Possible (AFAP) heuristic, and renewable self-consumption reaches 52% under combined wind and solar supply. Embedding carbon intensity forecasts into the RL state and reward aligns charging with low-emission periods while preserving grid compliance and user satisfaction.

2605.24541 2026-05-26 cs.LG cs.AI cs.CL cs.IR

SemanticZip: A Pilot Framework for Lossy Text Compression with LLMs as Semantic Decompressors

SemanticZip: 以LLM作为语义解压器的有损文本压缩的试点框架

Natalia Trukhina, Vadim Vashkelis

AI总结 提出SemanticZip框架,通过LLM将文本压缩为紧凑代码并解压为任务相关语义,在结构化散文、JSON等六种表示上评估,发现结构化散文恢复率最高(WAR=0.956,19.1%令牌增益),而CCL-Min平衡性最佳(39.4%令牌增益,WAR=0.874)。

Comments 13 pages, 1 figure, 2 tables. Pilot framework paper; code and supplementary artifacts available in ancillary files

详情
AI中文摘要

大型语言模型(LLM)系统的文本压缩通常被框架化为令牌删除、检索、摘要或精确重建。我们研究了一种更具攻击性但明确有损的设置:将文本压缩为紧凑代码,LLM可以将其扩展为任务相关的含义。我们将此设置称为SemanticZip。与无损压缩不同,SemanticZip不需要字节相同的重建;与普通摘要不同,它将基于模型的解压缩视为编解码器的一部分,并评估是否恢复了任务相关的语义承诺。 本文是一个试点框架,而非基准声明。我们形式化了LLM介导的解压缩,定义了受保护/有损数据包架构,并在五个作者构建的诊断案例上评估了六种表示体系:结构化散文、JSON、CCL-Core、CCL-Min、SemanticZip ASCII和SemanticZip emoji。一个独立的解码器LLM从每种压缩表示中重建类型化的语义原子,我们评估关键原子召回率、加权原子召回率、精确度和分词器增益。在该试点中,结构化散文具有最高的可恢复性,WAR=0.956,o200k_base令牌增益19.1%。CCL-Min是最强的平衡点,令牌增益39.4%,WAR=0.874。SemanticZip ASCII提供了最大的有用压缩,令牌增益46.5%,WAR=0.802,而表情符号密集的SemanticZip在压缩和恢复方面表现均较差。 主要贡献并非声称这些数字建立了通用前沿。相反,我们引入了一个可重复的实验接口,用于研究有损、LLM可解压的文本代码,以及一个设计原则:安全关键和精确的承诺应保持受保护,而可预测的低风险上下文可以进行语义压缩。

英文摘要

Text compression for large language model (LLM) systems is usually framed as token deletion, retrieval, summarization, or exact reconstruction. We study a more aggressive but explicitly lossy setting: compress text into compact codes that an LLM can expand into task-relevant meaning. We call this setting SemanticZip. Unlike lossless compression, SemanticZip does not require byte-identical reconstruction; unlike ordinary summarization, it treats model-based decompression as part of the codec and evaluates whether task-relevant semantic commitments are recovered. This paper is a pilot framework, not a benchmark claim. We formalize LLM-mediated decompression, define a protected/lossy packet architecture, and evaluate six representation regimes over five author-constructed diagnostic cases: structured prose, JSON, CCL-Core, CCL-Min, SemanticZip ASCII, and SemanticZip emoji. An independent decoder LLM reconstructs typed semantic atoms from each compressed representation, and we score Critical Atom Recall, Weighted Atom Recall, precision, and tokenizer gain. In this pilot, structured prose has the highest recoverability, with WAR = 0.956 and 19.1% o200k_base token gain. CCL-Min is the strongest balanced point, with 39.4% token gain and WAR = 0.874. SemanticZip ASCII provides the largest useful compression, with 46.5% token gain and WAR = 0.802, while emoji-heavy SemanticZip performs worse on both compression and recovery. The main contribution is not the claim that these numbers establish a universal frontier. Rather, we introduce a reproducible experimental interface for studying lossy, LLM-decompressible text codes and a design principle: safety-critical and exact commitments should remain protected, while predictable low-risk context may be semantically zipped.

2605.24539 2026-05-26 cs.AI

DemoEvolve: Overcoming Sparse Feedback in Agentic Harness Evolution with Demonstrations

DemoEvolve:利用演示克服智能体框架演化中的稀疏反馈

Lirong Che, Yuzhe yang, Peiwen lin, Chuang wang, Xueqian wang, Jian su

AI总结 提出DemoEvolve方法,通过人类演示引导框架演化,解决长时域随机环境中自生成轨迹因稀疏反馈和高方差导致的脆弱性问题,在Liar's Dice和Balatro任务中验证了其有效性。

详情
AI中文摘要

智能体框架演化通过修改冻结语言模型周围的可执行结构来改进它们。我们将这一范式研究为一种样本高效的快速适应形式:智能体无需更新模型权重,而是通过改变其外部框架来获取任务特定能力,同时保留基础模型的通用能力。先前的工作表明,自生成轨迹可以支持框架搜索,暗示智能体可以通过练习获得新的任务能力。然而,在长时域随机环境中,自我练习变得脆弱:奖励稀疏,结果方差高,且失败难以归因于具体的框架机制。我们引入了DemoEvolve,一种基于演示引导的框架演化方法。当仅依赖奖励的搜索过于宽泛和嘈杂时,胜任的人类轨迹作为编码提议者的专家参考经验,指导框架级别的诊断和编辑。在Liar's Dice上的实验表明,当回合短且失败可归因时,自轨迹演化可以工作。相比之下,Balatro暴露了更困难的长时域随机场景,其中自轨迹演化被稀疏反馈和候选选择噪声误导,而仅靠教程式文本知识无法带来稳定的改进。在相同的有限预算下,DemoEvolve产生了更有效和可审计的框架编辑,并实现了更好的性能。总体而言,演示使稀疏反馈的框架演化更具可诊断性、可定位性和稳定性。

英文摘要

Agent harness evolution improves frozen language-model agents by modifying the executable structures around them. We study this paradigm as a form of sample-efficient fast adaptation: instead of updating model weights, an agent can acquire task-specific competence by changing its external harness, while leaving the base model's general capabilities intact. Prior work shows that self-generated rollouts can support harness search, suggesting that agents may acquire new task competence through practice. Yet in long-horizon stochastic environments, self-practice becomes fragile: rewards are sparse, outcomes are high-variance, and failures are hard to attribute to concrete harness mechanisms. We introduce DemoEvolve, a demonstration-bootstrapped approach to harness evolution. When reward-only search is too broad and noisy, competent human trajectories serve as expert reference experience for the coding proposer, guiding harness-level diagnosis and editing. Experiments on Liar's Dice show that self-rollout evolution can work when episodes are short and failures are attributable. In contrast, Balatro exposes a harder long-horizon stochastic regime, where self-rollout evolution is misled by sparse feedback and candidate-selection noise, while tutorial-like textual knowledge alone does not yield stable improvement. Under the same limited budget, DemoEvolve produces more effective and auditable harness edits and achieves better performance. Overall, demonstrations make sparse-feedback harness evolution more diagnosable, localizable, and stable.

2605.24534 2026-05-26 cs.CL

Generating Legal Commentaries from Case Databases via Retrieval, Clustering, and Generation

通过检索、聚类和生成从案例数据库中生成法律评论

Max Prior, Niklas Wais, Matthias Grabmair

AI总结 提出一个全自动流水线,利用检索、聚类和生成方法,从法院判决中自动生成法律评论,无需人工教义框架。

详情
AI中文摘要

我们提出了一个全自动流水线,将大量法院判决转化为法规的法律评论——无需提供任何手工制作的教义框架。使用德国联邦最高法院引用德国民法典第242、280、812和823条的4555份判决,我们提取段落级块,总结其推理,并推导关键词,这些关键词被嵌入和聚类。对于每个聚类,一个LLM生成标题并综合引用丰富的章节,然后由四个最先进的LLM合并成连贯的评论。我们使用人类专家和LLM评判员,沿着五个维度——主题相关性、标题匹配、引用忠实性、聚类区分度和逻辑顺序——进行评估。我们的结果表明,从法院判决中挖掘类似评论的论点以生成可在几分钟内以最低成本更新的报告是可行的,但突出了由于来源受限和法律推理规范性而产生的局限性。

英文摘要

We present a fully automated pipeline that transforms large collections of court decisions into legal commentaries for statutes - without providing any handcrafted doctrinal framework. Using 4.555 decisions of the German Federal Court of Justice that cite sections 242, 280, 812 and 823 of the German Civil Code (BGB), we extract paragraph-level chunks, summarize their reasoning, and derive keywords, which are embedded and clustered. For each cluster, an LLM generates headings and synthesizes citation-rich sections, which are then merged into coherent commentaries by four state-of-the-art LLMs. We evaluate along five dimensions - topical relevance, heading-match, citation faithfulness, cluster distinction and logical ordering - using both a human expert and an LLM-judge. Our results show that commentary-like argument mining from court decisions to generate reports that can be refreshed within minutes at minimal cost is feasible, yet they highlight limitations arising from restricted sources and the normativity of legal reasoning.

2605.24533 2026-05-26 cs.CV

Learnable Shape Prototypes with Occlusion-Geometry-Guided Injection for Amodal Instance Segmentation

可学习形状原型与遮挡几何引导注入的模态实例分割

Fufan Zhang, Jingxiang Wang, Xiangjie Ye

AI总结 提出一种门控可靠性自适应形状先验框架,通过可学习原型和交叉注意力生成实例自适应形状先验,并利用可见掩码的符号距离场调节注入强度,在多个评估设置下超越现有方法。

Comments 13 pages, 7 figures, 5 tables. Submitted to IEEE Transactions on Circuits and Systems for Video Technology

详情
AI中文摘要

模态实例分割旨在预测完整的物体掩码,包括被遮挡区域,这些区域缺乏像素级观测,必须借助形状先验进行推断。现有方法通过固定容量编码空间或昂贵的生成模型获取形状先验,并在所有空间位置均匀注入,而不适应可见区域和遮挡区域之间不同的先验需求。本文提出一种门控可靠性自适应形状先验框架,该框架引入一个形状先验记忆模块,通过交叉注意力组合可学习原型,通过加权原型组合(而非生成)产生实例自适应形状先验。然后,一个空间自适应可靠性门利用可见掩码的符号距离场,根据每个位置的遮挡深度调节注入强度,在可见区域保留可靠特征,同时将形状补偿引导至遮挡区域。在两个主流模态实例分割基准上的实验表明,所提方法在多个评估设置下优于现有方法,在标准设置下,其中一个基准上的遮挡区域平均交并比提高了超过11个百分点,同时总参数量约为三分之一。线性探针分析进一步揭示,可见掩码交叉注意力模块隐式地将遮挡几何编码到视觉标记表示中,解释了所提模块分解的有效性。

英文摘要

Amodal instance segmentation aims to predict the complete object mask including occluded regions that lack pixel-level observations and must be inferred with the aid of shape priors. Existing methods acquire shape priors through fixed-capacity encoding spaces or expensive generative models, and inject them uniformly across all spatial positions without adapting to the varying prior demand between visible and occluded regions. In this paper, we propose a gated reliability-adaptive shape prior framework, which introduces a shape prior memory module that combines learnable prototypes via cross-attention to produce instance-adaptive shape priors through weighted prototype combination rather than generation. A spatial adaptive reliability gate then employs the signed distance field of the visible mask to modulate injection intensity at each position according to its occlusion depth, preserving reliable features in visible regions while directing shape compensation toward occluded areas. Experiments on two mainstream amodal instance segmentation benchmarks demonstrate that the proposed method outperforms existing approaches under multiple evaluation settings, improving the mean intersection-over-union over occluded regions by over 11 percentage points on one of the two benchmarks under the standard setting, while using approximately one-third of the total parameters. Linear probing analysis further reveals that the visible-mask cross-attention module implicitly encodes occlusion geometry into visual token representations, explaining the effectiveness of the proposed module decomposition.

2605.24532 2026-05-26 cs.CV

Image-Conditioned Instance Prompt Network for Referring Remote Sensing Image Segmentation

图像条件实例提示网络用于遥感图像指代分割

Biaoyu Ren, Qingsheng Wang, Cun Xu, Dingkang Yang, Wenxuan Wang

AI总结 提出图像条件实例提示网络(ICIPNet),通过自适应视觉语义表示和双边信息融合模块,缓解跨模态特征融合瓶颈,提升遥感图像指代分割性能。

Comments 6 pages, 3 figures. Equal contribution: Biaoyu Ren and Qingsheng Wang. Corresponding authors: Dingkang Yang and Wenxuan Wang

详情
AI中文摘要

遥感图像指代分割(RRSIS)是一项与具身感知范式相关的情境化、任务驱动的跨模态任务,要求模型将视觉空间特征与语言意图对齐以实现精确的目标感知。近期研究聚焦于细化文本特征的粒度并优化图像-文本特征融合,以更好地引导目标特征表示。然而,描述粒度不足和对语义偏移的敏感性可能导致跨模态特征融合的瓶颈。为解决这些问题,我们提出带有双边信息融合的图像条件实例提示网络(ICIPNet),旨在缓解跨模态特征融合的瓶颈。ICIPNet引入图像条件实例提示(ICIP)模块,无需外部知识即可生成自适应的视觉和语义表示。双边信息融合(BIF)模块沿token和通道维度增强特征融合。实验表明,所提出的ICIPNet优于现有RRSIS模型。

英文摘要

Referring Remote Sensing Image Segmentation (RRSIS) is a situated, task-driven cross-modal task related to the embodied perception paradigm, requiring models to align visual-spatial features with linguistic intentions for precise target perception. Recent research has focused on refining the granularity of textual features and optimizing image-text feature fusion to better guide target feature representations. However, insufficient descriptive granularity and sensitivity to semantic shifts can cause bottlenecks in cross-modal feature fusion. To address these issues, we propose the Image-Conditioned Instance Prompt Network (ICIPNet) with Bilateral Information Fusion, which is designed to alleviate bottlenecks in cross-modal feature fusion. ICIPNet introduces an Image-Conditioned Instance Prompt (ICIP) module to generate self-adaptive visual and semantic representations without external knowledge. The Bilateral Information Fusion (BIF) module enhances feature fusion along the token and channel dimensions. Experiments demonstrate that the proposed ICIPNet outperforms existing RRSIS models.

2605.24531 2026-05-26 cs.CV

NudgeVAD: Language-Nudged End-to-End Driving via FiLM Residuals

NudgeVAD: 通过FiLM残差的语言引导端到端驾驶

Chieh-Chi Yang, Yu-Hsiang Chen, Yi-Ting Chen

AI总结 提出NudgeVAD框架,利用语言作为校准的微调信号,通过恒等初始化的FiLM和零初始化残差头,在命令不可靠时显著提升驾驶轨迹预测性能。

Comments Technical report for the doScenes Instructed Driving Challenge, CVPR 2026 DriveX Workshop. 1st place in the Ablation track

详情
AI中文摘要

自然语言指令有望实现可控的端到端驾驶,但当规划器已经接收到可靠的高级命令时,其优势可能被掩盖。我们提出NudgeVAD,一个冻结规划器残差框架,利用语言作为对VAD轨迹的校准微调。通过恒等初始化的FiLM和零初始化的残差头,NudgeVAD在初始化时等价于冻结规划器,因此学习到的偏差仅来自语言条件残差。我们沿命令可靠性轴评估NudgeVAD。在可靠命令下,语言改进了初始规划器,但与VAD-FT (UNCOND)(一个计算量匹配的、无语言微调的VAD模型)相比几乎冗余。然而,在随机命令下,语言变得至关重要:去除文本使ADE6s降至3.166米,而带有文本的NudgeVAD恢复至2.806米,并优于VAD-FT (UNCOND) 0.312米。这些结果表明,语言并非普遍可加;当分类命令通道不可靠时,它最有价值。

英文摘要

Natural-language instructions promise controllable end-to-end driving, but their benefit can be hidden when planners already receive reliable high-level commands. We propose NudgeVAD, a frozen-planner residual framework that uses language as a calibrated nudge to a VAD trajectory. With identity-initialized FiLM and a zero-initialized residual head, NudgeVAD is equivalent to the frozen planner at initialization, so learned deviations arise only from language-conditioned residuals. We evaluate NudgeVAD along a command-reliability axis. With reliable commands, language improves the initial planner but becomes nearly redundant once compared against VAD-FT (UNCOND), a compute-matched VAD model fine-tuned without language. With random commands, however, language becomes essential: detaching text degrades ADE6s to 3.166 m, while NudgeVAD with text recovers 2.806 m and outperforms VAD-FT (UNCOND) by 0.312 m. These results show that language is not universally additive; it is most valuable when the categorical command channel is unreliable.