arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 4046
2511.17994 2026-05-12 cs.LG stat.ML

Learning Rate Scheduling with Matrix Factorization for Private Training

Nikita P. Kalinin, Joel Daniel Andersson

AI总结 本文研究了在学习率调度和相关噪声背景下进行差分隐私模型训练的问题。作者通过矩阵分解方法引入相关噪声以提升模型精度,并针对实际中广泛使用的非固定学习率调度策略,推导了单轮和多轮训练场景下的一般误差上界和下界。基于理论分析,提出了一种学习率感知的矩阵分解方法,在多种误差指标下均优于传统的前缀和分解方法,并在CIFAR-10和IMDB数据集上的实验验证了其有效性。

Comments Accepted at FORC 2026

详情
英文摘要

We study differentially private model training with stochastic gradient descent under learning rate scheduling and correlated noise. Although correlated noise, in particular via matrix factorizations, has been shown to improve accuracy, prior theoretical work focused primarily on the prefix-sum workload. That workload assumes a constant learning rate, whereas in practice learning rate schedules are widely used to accelerate training and improve convergence. We close this gap by deriving general upper and lower bounds for a broad class of learning rate schedules in both single- and multi-epoch settings. Building on these results, we propose a learning-rate-aware factorization that achieves improvements over prefix-sum factorizations under both MaxSE and MeanSE error metrics. Our theoretical analysis yields memory-efficient constructions suitable for practical deployment, and experiments on CIFAR-10 and IMDB datasets confirm that schedule-aware factorizations improve accuracy in private training.

2511.09493 2026-05-12 cs.AI cs.LG

Consensus Sampling for Safer Generative AI

Adam Tauman Kalai, Yael Tauman Kalai, Or Zamir

AI总结 本文研究了生成式人工智能中不可检测的风险问题,提出了一种名为共识采样的鲁棒聚合方法,能够在多个概率分布中提升安全性。该方法在分布之间缺乏足够一致性时选择放弃输出,从而在不依赖具体模型结构的前提下保障生成结果的安全性。实验表明,该方法在合成分布和图像生成任务中有效,且具有对抗性影响和信息泄露的理论保证。

详情
英文摘要

Motivated by undetectable risks in generative AI, we study a general robust aggregation problem: how to aggregate several probability distributions to boost safety. We present consensus sampling, a black-box algorithm that, given k distributions, has risk competitive with the average risk of the safest $s$ while abstaining when there is insufficient agreement. This yields an architecture-agnostic approach to generative-model safety when the distributions are induced by models that can sample and evaluate output probabilities. We formalize the guarantee through R-robustness, which also bounds information leakage and adversarial influence. Inspired by robust statistics and the provable copyright protection algorithm of Vyas et al (2023), we show that while a standard mixture is vulnerable to one unsafe constituent, a pointwise-median construction provides robust intuition, and our efficient sampler is Pareto-optimal for the tradeoff between worst-case risk and abstention. Experiments on synthetic distributions and image generation illustrate the general mechanism and its motivating safety application. The method requires overlap among safe distributions, but it provides a model-agnostic way to inherit guarantees from an unknown reliable subset.

2511.07833 2026-05-12 cs.LG cs.AI

MURPHY: Feedback-Aware GRPO with Retrospective Credit Assignment for Multi-Turn Code Generation

Chanakya Ekbote, Vijay Lingam, Sujay Sanghavi, Jun Huan, Behrooz Omidvar-Tehrani, Anoop Deoras, Stefano Soatto

AI总结 MURPHY 是一种针对多轮代码生成任务的改进型强化学习方法,旨在解决传统 GRPO 方法在处理需要多轮交互和反馈修正的场景时的不足。该方法通过构建反馈条件化的 rollout 树,将执行器反馈与失败的候选解结合,并在后续轮次中进行扩展,从而实现基于反馈的多轮优化。MURPHY 引入了两种奖励传播策略和优化剪枝机制,显著提升了代码生成的性能,在多个基准测试中取得了优于现有方法的绝对提升。

Comments 21 pages, 2 figures, 8 Tables

详情
英文摘要

Reinforcement Learning with Verifiable Rewards (RLVR) has become a standard recipe for post-training LLMs on reasoning tasks, with Group Relative Policy Optimization (GRPO) emerging as a leading approach. However, GRPO and its variants are inherently single-turn: they optimize from terminal rewards on isolated prompt-response pairs, leaving them poorly suited to agentic settings where models must iteratively refine solutions in response to environmental feedback. We introduce MURPHY, a multi-turn extension of GRPO for self-correcting code generation. MURPHY constructs feedback-conditioned rollout trees in which failed candidate solutions are paired with executor feedback and expanded into subsequent turns, and propagates rewards backward through the tree so that later successful refinements credit earlier attempts that surfaced informative feedback. We study two propagation strategies, Max Reward (MARS) and Mean Reward (MERS), and introduce post-rollout pruning mechanisms that reduce multi-turn optimization cost. Across three code generation benchmarks (HumanEval, MBPP, LiveCodeBench-v6) and two model families (Qwen3-1.7B/4B, OLMo-2-7B), MURPHY delivers up to 6% absolute pass@1 gains over the strongest prior multi-turn execution-feedback methods. Gains are largest on the Medium/Hard subset (+4.38/+4.20 at Iter-5), where iterative self-correction matters more.

2511.07055 2026-05-12 cs.CL cs.IR cs.LG

Complete Evidence Extraction with Model Ensembles: A Case Study on Medical Coding

Katharina Beckh, Sven Heuser, Stefan Rüping

AI总结 本文研究了在医疗编码场景中提取完整证据的问题,即从输入文本中识别出所有支持编码决策的相关内容。为提高证据的完整性,作者受到“拉什莫恩效应”启发,通过集成多个语言模型的标记级证据来增强提取结果。实验表明,模型集成方法在提升证据召回率的同时仅带来较小的计算开销,并且三个模型的集成效果已优于单一最佳模型,能够恢复单一模型遗漏的信息。

详情
英文摘要

High-stakes decisions informed by decision support systems require explicit evidence. While prior work focuses on short sufficient evidence, regulatory compliance and medical billing call for complete evidence: all relevant input tokens that support a decision. We formulate complete evidence extraction as a task and study it in a medical coding setting. Motivated by the Rashomon effect, we aggregate token-level evidence from multiple language models to increase evidence completeness. We perform a case study using existing equally-performing models, feature attributions, and a dataset with human-annotated evidence. Our results show that Rashomon ensembles significantly increase evidence recall while incurring only a small token overhead over individual models. Ensembles of only three models already outperform the best single model and recover information that individual models miss.

2511.06894 2026-05-12 cs.LG cs.AI

COGNOS: Universal Enhancement for Time Series Anomaly Detection via Constrained Gaussian-Noise Optimization and Smoothing

Wenlong Shang, Shihao Tian, Xutong Wan, Peng Chang

AI总结 本文针对基于重构的时间序列异常检测方法中普遍存在的重建残差统计特性不佳的问题,提出了一种通用的增强框架COGNOS。该方法通过引入高斯白噪声正则化策略,在训练过程中约束模型输出残差服从高斯白噪声分布,并结合自适应残差卡尔曼平滑器对异常得分进行去噪,从而提升检测的稳定性与准确性。实验表明,COGNOS能有效增强多种先进模型的性能,验证了其方法的有效性。

详情
英文摘要

Reconstruction-based methods are a dominant paradigm in time series anomaly detection (TSAD), however, their near-universal reliance on Mean Squared Error (MSE) loss results in statistically flawed reconstruction residuals. This fundamental weakness leads to noisy, unstable anomaly scores, hindering reliable detection. To address this, we propose Constrained Gaussian-Noise Optimization and Smoothing (COGNOS), a universal, model-agnostic enhancement framework that tackles this issue at its source. COGNOS introduces a novel Gaussian-White Noise Regularization strategy during training, which directly constrains the model's output residuals to conform to a Gaussian white noise distribution. This engineered statistical property creates the ideal precondition for our second contribution: Adaptive Residual Kalman Smoother that operates as a statistically robust estimator to denoise the raw anomaly scores. Extensive experiments on multiple benchmarks demonstrate that COGNOS consistently enhances the performance of state-of-the-art backbones significantly, validating the efficacy of coupling statistical regularization with adaptive filtering.

2511.03368 2026-05-12 cs.LG

TripleWin: Fixed-Point Equilibrium Pricing for Data-Model Coupled Markets

Hongrun Ren, Yun Xiong, Lei You, Yingying Wang, Haixu Xiong, Yangyong Zhu

AI总结 随着机器学习模型经济的发展,数据集与预训练模型的市场逐渐交织在一起,但现有定价方法多将数据与模型交易割裂或依赖偏向一方的中介机制。本文提出一种统一的数据-模型耦合市场框架,通过供需映射建立数据集与模型报价之间的闭环交互,实现买卖双方之间的双向价格传播与相互耦合。该方法基于标准干扰函数理论,保证了均衡价格的存在性、唯一性与全局收敛性,实验表明其在收敛效率和公平性方面优于传统方法。

详情
英文摘要

The rise of the machine learning (ML) model economy has intertwined markets for training datasets and pre-trained models. However, most pricing approaches still separate data and model transactions or rely on broker-centric pipelines that favor one side. Recent studies of data markets with externalities capture buyer interactions but do not yield a simultaneous and symmetric mechanism across data sellers, model producers, and model buyers. We propose a unified data-model coupled market that treats dataset and model trading as a single system. A supply-side mapping transforms dataset payments into buyer-visible model quotations, while a demand-side mapping propagates buyer prices back to datasets through Shapley-based allocation. Together, they form a closed loop that links four interactions: supply-demand propagation in both directions and mutual coupling among buyers and among sellers. We prove that the joint operator is a standard interference function (SIF), guaranteeing existence, uniqueness, and global convergence of equilibrium prices. Experiments demonstrate efficient convergence and improved fairness compared with broker-centric and one-sided baselines. The code is available on https://github.com/HongrunRen1109/Triple-Win-Pricing.

2510.22170 2026-05-12 cs.AI

Measure what Matters: Psychometric Evaluation of AI with Situational Judgment Tests

Alexandra Yost, Shreyans Jain, Shivam Raval, Grant Corser, Allen Roush, Nina Xu, Jacqueline Hammack, Ravid Shwartz-Ziv, Amirali Abdullah

AI总结 该研究探讨了如何通过情境判断测试(SJTs)对大型语言模型(LLM)的行为进行心理测量评估,旨在判断基于角色条件生成的行为是否具有稳定结构。研究提出了一种结合多维项目反应理论(MIRT)和结构化合成角色的方法,将模型响应视为潜在行为变量的观测值,从而衡量其一致的行为倾向。实验表明,角色条件下的行为具有跨场景的稳定性,并能预测外部基准,证明该方法比传统自述式评估更具可靠性。

Comments 100 pages

详情
英文摘要

Persona conditioning is widely used to steer large language model (LLM) behavior, but it is unclear whether it induces stable behavioral structure or superficial variation. We propose a framework to measure consistent behavioral tendencies using situational judgment tests (SJTs), multidimensional item response theory (MIRT), and structured synthetic personas, treating responses as observations of latent behavioral variables. Across large-scale SJT and persona datasets, we find that persona-conditioned behaviors are stable across runs, latent trait scores predict external benchmarks (e.g., TruthfulQA, EmoBench), and MIRT reveals consistent latent structure. We validate these results through human annotation, benchmark evaluation, and internal consistency analyses. We interpret these traits not as human personality, but as stable behavioral tendencies expressed across contexts. Our results show that scenario-based psychometric evaluation provides a more reliable alternative to classical self-report approaches for assessing LLM behavior, and we release datasets to support further study.

2510.20036 2026-05-12 cs.CL cs.SE

ToolScope: Enhancing LLM Agent Tool Use through Tool Merging and Context-Aware Filtering

Marianne Menglin Liu, Daniel Garcia, Fjona Parllaku, Vikas Upadhyay, Syed Fahad Allam Shah, Dan Roth

AI总结 大型语言模型(LLM)代理在执行复杂任务时依赖外部工具,但现实中的工具集常存在名称和描述重叠的冗余工具,导致选择模糊且准确性下降。为解决这一问题,本文提出ToolScope方法,通过工具合并与上下文感知的过滤机制,自动消除冗余并高效筛选最相关的工具,从而在不牺牲准确性的前提下适应输入限制。实验表明,ToolScope在三个主流LLM和三个开源工具使用基准上显著提升了工具选择的准确性,最高提升达38.6%。

Comments ACL Main Conference 2026

详情
英文摘要

Large language model (LLM) agents rely on external tools to solve complex tasks, but real-world toolsets often contain redundant tools with overlapping names and descriptions, introducing ambiguity and reducing selection accuracy. LLMs also face strict input context limits, preventing efficient consideration of large toolsets. To address these challenges, we propose ToolScope, which includes: (1) ToolScopeMerger with Auto-Correction to automatically audit and fix tool merges, reducing redundancy, and (2) ToolScopeRetriever to rank and select only the most relevant tools for each query, compressing toolsets to fit within context limits without sacrificing accuracy. Evaluations on three state-of-the-art LLMs and three open-source tool-use benchmarks show gains of 8.38% to 38.6% in tool selection accuracy, demonstrating ToolScope's effectiveness in enhancing LLM tool use.

2510.13830 2026-05-12 cs.CL cs.AI

Users as Annotators: LLM Preference Learning from Comparison Mode

Zhongze Cai, Xiaocheng Li

AI总结 本文探讨了一种从用户日常交互中收集大型语言模型(LLM)偏好数据的新方法,即通过用户对比模式下的标注来替代专业标注员。研究提出了一种基于用户行为模型的期望最大化算法,用于估计用户的标注质量并过滤低质量数据,从而在保持用户主观判断优势的同时提升数据可靠性。该方法在LLM对齐任务中表现出良好的行为捕捉与数据过滤效果,为偏好学习提供了新的思路。

详情
英文摘要

Pairwise preference data have played an important role in the alignment of large language models (LLMs). Each sample of such data consists of a prompt, two different responses to the prompt, and a binary label indicating which of the two responses is better. The labels are usually annotated by professional human annotators. In this paper, we consider an alternative approach to collect pairwise preference data -- user annotation from comparison mode. With the increasingly wider adoption of LLMs among the population, users are contributing more and more of their preference labels through their daily interactions with the LLMs. The upside of such labels is that users are the best experts in judging the responses to their own queries/prompts, but the downside is the lack of quality control in these labels. In this paper, we consider a new idea of generating two responses from two different models or two different versions of the same model. The asymmetry allows us to make an inference of the user's data quality through our proposed user behavior model. We develop an expectation-maximization algorithm to estimate a latent quality factor of the user, and filter users' annotation data accordingly. The downstream task shows the effectiveness of our approach in both capturing the user behavior and data filtering for LLM alignment.

2510.11491 2026-05-12 cs.RO cs.LG cs.SY eess.SY

Constraint-Aware Reinforcement Learning via Adaptive Action Scaling

Murad Dawood, Usama Ahmed Siddiquie, Shahram Khorshidi, Maren Bennewitz

AI总结 该研究提出了一种基于自适应动作缩放的约束感知强化学习方法,旨在在保证任务性能的同时减少训练过程中的约束违反。不同于传统方法中使用单一策略或外部安全过滤器,该方法引入了一个模块化的成本感知调节器,根据预测的约束违反情况对智能体的动作进行平滑调整,从而在不干扰探索的前提下提升安全性。该方法能够与离线策略强化学习算法如SAC和TD3无缝集成,在稀疏成本的Safety Gym任务中取得了显著优于现有方法的性能提升。

Comments Accepted in 8th Annual Learning for Dynamics & Control Conference (L4DC)

详情
英文摘要

Safe reinforcement learning (RL) seeks to mitigate unsafe behaviors that arise from exploration during training by reducing constraint violations while maintaining task performance. Existing approaches typically rely on a single policy to jointly optimize reward and safety, which can cause instability due to conflicting objectives, or they use external safety filters that override actions and require prior system knowledge. In this paper, we propose a modular cost-aware regulator that scales the agent's actions based on predicted constraint violations, preserving exploration through smooth action modulation rather than overriding the policy. The regulator is trained to minimize constraint violations while avoiding degenerate suppression of actions. Our approach integrates seamlessly with off-policy RL methods such as SAC and TD3, and achieves state-of-the-art return-to-cost ratios on Safety Gym locomotion tasks with sparse costs, reducing constraint violations by up to 126 times while increasing returns by over an order of magnitude compared to prior methods.

2510.09887 2026-05-12 cs.CL

Overconfident and Blind to Details: Fixing Prompt Insensitivity with Abductive Preference Learning

Yijin Ni, Simon Yu, Peng Qi

AI总结 视觉语言模型在面对语义关键的输入修改时常常忽略细节,依赖预训练的先验知识,导致在基准测试中表现不佳。为了解决这一问题,本文提出了一种新的方法——归纳偏好学习(abductive preference learning),通过优化逆向策略 $π(x \mid y)$ 来提升模型对罕见提示的敏感性。该方法在无需修改模型结构的前提下显著提升了模型在反事实敏感性任务上的表现,实验表明其在多个基准上均取得了比现有方法更优的提升效果。

详情
英文摘要

Vision and language models frequently ignore semantically critical input edits, defaulting to pretraining priors. For example, models will confidently assert a five-legged dog has four legs; consequently, on the VLMBias benchmark, GPT 5.2 and Claude Sonnet 4.6 achieve only $4.6\%$ and $0\%$ accuracy, respectively. Existing methods address this problem through building up datasets that covers the underrepresented inputs to tune the policy function $π(y \mid x)$, where $x$ and $y$ refer to input prompts and responses, respectively. However, prompting baselines yield gains of under $3\%$ on VLMBias due to the low probability density of rare prompts. To bypass this bottleneck, we propose \emph{abductive preference learning} to optimize the abductive policy $π(x \mid y)$. We prove this amplifies forward policy improvements by a factor of $q(y)/p(x)$, where $p(\cdot)$ and $q(\cdot)$ denote the marginal probabilities of the prompt and response, yielding the largest gains on the rarest prompts. Furthermore, we demonstrate that for translation invariant pairwise preference learning methods, such as DPO, estimating $π(x \mid y)$ reduces to a structural data swap that compares prompts for a fixed response, requiring no architectural changes. Empirically, abductive preference learning delivers large gains on counterfactual sensitivity: on VLMBias, A-DPO raises accuracy from $3\%$ to $44\%$ ($14\times$), outperforming GPT-5.2 ($4.6\%$) and all closed-source VLMs except Gemini~3~Flash; on Inverse-IFEval, Multi-DPOP reaches $65$--$84\%$, surpassing GPT-5 ($73.7\%$) at the 9B scale while preserving IFBench, unlike DPO which degrades it by $8$--$12\%$.

2510.09877 2026-05-12 cs.LG cs.AI stat.ML

Batch Bayesian Active Learning with Partial Batch Label Sampling

Kangping Hu, Stephen Mussmann

AI总结 本文研究了批量贝叶斯主动学习中标签采样的问题,针对现有方法在大批次场景下计算复杂或性能下降的挑战,提出了一种基于贝叶斯决策理论的局部批量标签采样方法ParBaLS,专门用于改进EPIG算法。实验表明,该方法在固定预算下相比其他方法具有更优的性能,尤其在结合大预训练模型嵌入的贝叶斯逻辑回归任务中表现突出。

详情
英文摘要

Over the past couple of decades, many active learning acquisition functions have been proposed, leaving practitioners with an unclear choice of which to use. Bayesian-based active learning offers principled objectives with explainable intuition, including Expected Error Reduction (EER), Expected Predictive Information Gain (EPIG), and Bayesian Active Learning by Disagreements (BALD). A key challenge of such methods is the difficult scaling to large batch sizes, leading to either computational challenges (BatchBALD) or dramatic performance drops (top-$B$ selection). Here, using a particular formulation of Bayesian Decision Theory, we derive Partial Batch Label Sampling (ParBaLS) for the EPIG algorithm. We show experimentally for several datasets that ParBaLS EPIG gives superior performance for a fixed budget and Bayesian Logistic Regression on embeddings from large pre-trained models. Our code is available at https://github.com/ADDAPT-ML/ParBaLS.

2510.09096 2026-05-12 cs.RO cs.AI cs.LG

When a Robot is More Capable than a Human: Learning from Constrained Demonstrators

Xinhu Li, Ayush Jain, Zhaojing Yang, Yigit Korkmaz, Erdem Bıyık

AI总结 本文研究了在专家演示受到约束的情况下,机器人如何学习比人类专家更优的策略。作者提出了一种方法,通过从受限演示中推断出仅依赖状态的任务奖励信号,并利用时间插值对未知状态进行自标记奖励,使机器人能够探索更短、更高效的轨迹。该方法在样本效率和任务完成时间上均优于传统模仿学习,在实际机器人实验中实现了10倍于行为克隆的加速效果。

详情
英文摘要

Learning from demonstrations enables experts to teach robots complex tasks using interfaces such as kinesthetic teaching, joystick control, and sim-to-real transfer. However, these interfaces often constrain the expert's ability to demonstrate optimal behavior due to indirect control, setup restrictions, and hardware safety. For example, a joystick can move a robotic arm only in a 2D plane, even though the robot operates in a higher-dimensional space. As a result, the demonstrations collected by constrained experts lead to suboptimal performance of the learned policies. This raises a key question: Can a robot learn a better policy than the one demonstrated by a constrained expert? We address this by allowing the agent to go beyond direct imitation of expert actions and explore shorter and more efficient trajectories. We use the demonstrations to infer a state-only reward signal that measures task progress, and self-label reward for unknown states using temporal interpolation. Our approach outperforms common imitation learning in both sample efficiency and task completion time. On a real WidowX robotic arm, it completes the task in 12 seconds, 10x faster than behavioral cloning, as shown in real-robot videos on https://sites.google.com/view/constrainedexpert .

2510.08592 2026-05-12 cs.CL cs.AI cs.LG

Less Diverse, Less Safe: The Indirect But Pervasive Risk of Test-Time Scaling in Large Language Models

Shahriar Kabir Nahin, Hadi Askari, Muhao Chen, Anshuman Chhabra

AI总结 本文研究了大语言模型在测试时扩展(Test-Time Scaling, TTS)中因候选响应多样性降低而带来的安全隐患。研究发现,当候选集多样性受到限制时,TTS更容易生成不安全的输出,这一问题在多种主流模型和策略中均被验证。为此,作者提出了一种基于参考引导的多样性缩减协议(RefDiv),用于检测TTS系统的脆弱性,并发现现有安全防护机制对此类攻击的防御效果有限。

Comments Accepted to ICML 2026

详情
英文摘要

Test-Time Scaling (TTS) improves LLM reasoning by exploring multiple candidate responses and then operating over this set to find the best output. A tacit premise behind TTS is that sufficiently diverse candidate pools enhance reliability. In this work, we show that this assumption in TTS introduces a previously unrecognized failure mode. When candidate diversity is curtailed, even by a modest amount, TTS becomes much more likely to produce unsafe outputs. We present a reference-guided diversity reduction protocol (RefDiv) that serves as a diagnostic attack to stress test TTS pipelines. Through extensive experiments across open-source models (e.g. Qwen3, Mistral, Llama3.1, Gemma3) and two widely used TTS strategies (Monte Carlo Tree Search and Best-of-N), constraining diversity consistently signifies the rate at which TTS produces unsafe results. The effect is often stronger than that produced by prompts directly with high adversarial intent scores. This observed phenomenon also transfers across TTS strategies and to closed-source models (e.g. OpenAI o3-mini and Gemini-2.5-Pro), thus indicating that this is a general and extant property of TTS rather than a model-specific artifact. Additionally, we find that numerous widely used safety guardrail classifiers (e.g. Llama-Guard), are unable to flag the adversarial input prompts generated by RefDiv, demonstrating that existing defenses offer limited protection against this diversity-driven failure mode.

2510.06637 2026-05-12 cs.LG cs.AI cs.CV

Control-Augmented Autoregressive Diffusion for Data Assimilation

Prakhar Srivastava, Farrin Marouf Sofian, Francesco Immorlano, Kushagra Pandey, Stephan Mandt

AI总结 本文提出了一种增强型自回归扩散模型(ARDM)方法,用于数据同化任务,通过引入一个预训练的控制器来指导生成过程,从而在保持原有扩散模型动态特性的同时,提升生成的稳定性和准确性。该方法基于随机最优控制理论,在去噪过程中注入微小控制信号,实现对未来观测的逐步修正。实验表明,该方法在混沌时空偏微分方程和实际气象数据集上均显著优于现有方法,实现了计算效率的大幅提升。

详情
英文摘要

Despite advances in test-time scaling and diffusion finetuning, guidance for Auto-Regressive Diffusion Models (ARDMs) remains underexplored. We introduce an amortized framework that augments a pretrained ARDM with an offline-trained controller. By previewing future rollouts, the controller learns stepwise corrections that anticipate observations under a terminal-cost objective, yielding a reusable policy for guided generation. Motivated by a stochastic optimal control view of ARDM trajectories, our method injects small controls within each denoising sub-step while staying close to the pretrained dynamics. We study this approach for dataassimilation (DA) in chaotic spatiotemporal partial differential equations (PDEs), where existing methods are often computationally expensive and susceptible to forecast drift under sparse observations. At inference, DA becomes a feed-forward rollout with on-the-fly corrections, achieving an order-of-magnitude speedup over strong diffusion-based baselines. Across two canonical PDEs and a compact ECMWF Reanalysis v5 (ERA5) pilot spanning six observation regimes, our method consistently improves stability and accuracy over state-of-the-art alternatives, with similar improvements observed in a larger-scale GenCast study.

2510.05635 2026-05-12 cs.LG cs.CV

NEO: No-Optimization Test-Time Adaptation through Latent Re-Centering

Alexander Murphy, Michal Danilowski, Soumyajit Chatterjee, Abhirup Ghosh

AI总结 本文提出了一种无需优化的测试时适应方法NEO,通过在潜在空间中对目标数据嵌入进行重新中心化,显著提升了源域与目标域样本之间的对齐效果。该方法无需超参数调优,计算开销极小,仅需少量样本即可实现性能提升,在多个数据集上优于现有7种测试时适应方法,并在模型校准和跨类别适应方面表现出色。实验表明,NEO在保持高效性的同时,能有效提升视觉Transformer模型在不同数据集上的分类准确率。

Comments ICLR 2026

详情
英文摘要

Test-Time Adaptation (TTA) methods are often computationally expensive, require a large amount of data for effective adaptation, or are brittle to hyperparameters. Based on a theoretical foundation of the geometry of the latent space, we are able to significantly improve the alignment between source and distribution-shifted samples by re-centering target data embeddings at the origin. This insight motivates NEO -- a hyperparameter-free fully TTA method, that adds no significant compute compared to vanilla inference. NEO is able to improve the classification accuracy of ViT-Base on ImageNet-C from 55.6% to 59.2% after adapting on just one batch of 64 samples. When adapting on 512 samples NEO beats all 7 TTA methods we compare against on ImageNet-C, ImageNet-R and ImageNet-S and beats 6/7 on CIFAR-10-C, while using the least amount of compute. NEO performs well on model calibration metrics and additionally is able to adapt from 1 class to improve accuracy on 999 other classes in ImageNet-C. On Raspberry Pi and Jetson Orin Nano devices, NEO reduces inference time by 63% and memory usage by 9% compared to baselines. Our results based on 3 ViT architectures and 4 datasets show that NEO can be used efficiently and effectively for TTA.

2510.04233 2026-05-12 cs.LG cs.AI

PAINET: A Principled Efficient Transformer for 3D Dynamics Modeling

Kai Yang, Yuqi Huang, Junheng Tao, Wanyu Wang, Qitian Wu

AI总结 本文提出了一种名为PAINET的原理性高效Transformer模型,用于建模多体系统的三维动力学。该模型基于SE(3)等变性设计,能够学习所有两两之间的相互作用,核心方法包括一个受物理能量函数最小化轨迹启发的注意力网络,以及一个保持等变性的并行解码器。实验表明,PAINET在人体动作捕捉、分子动力学和蛋白质模拟等多个现实基准上均优于现有方法,显著提升了三维动力学预测的准确性。

Comments 24 pages, published as a conference paper at ICLR 2026

详情
英文摘要

Modeling 3D dynamics is a fundamental problem in multi-body systems across scientific and engineering domains and has important practical implications in object trajectory prediction and simulation. While recent GNN-based approaches have achieved strong performance by enforcing geometric symmetries, encoding high-order features or incorporating neural-ODE mechanics, they typically depend on explicitly observed structures and inherently fail to capture the unobserved interactions that are crucial to complex physical behaviors and dynamics mechanism. In this paper, we propose PAINET, a principled SE(3)-equivariant transformer for learning all-pair interactions in multi-body systems. The model comprises: (1) a novel physics-inspired attention network derived from the minimization trajectory of an energy function, and (2) a parallel decoder that preserves equivariance while enabling efficient inference. Empirical results on diverse real-world benchmarks, including human motion capture, molecular dynamics, and large-scale protein simulations, show that PAINET consistently outperforms recently proposed models, yielding 4.7% to 41.5% error reductions in 3D dynamics prediction with comparable computation costs in terms of time and memory. Our codes, baseline models and datasets are available at https://github.com/Icarus1411/PAINET.

2509.26574 2026-05-12 cs.AI cond-mat.other cs.CL hep-th quant-ph

Probing the Critical Point (CritPt) of AI Reasoning: a Frontier Physics Research Benchmark

Minhui Zhu, Minyang Tian, Xiaocheng Yang, Tianci Zhou, Lifan Yuan, Penghao Zhu, Eli Chertkov, Shengyan Liu, Yufeng Du, Ziming Ji, Indranil Das, Qingzhi Chen, Junyi Cao, Yufeng Du, Jiabin Yu, Peixue Wu, Jinchen He, Yifan Su, Yikun Jiang, Yujie Zhang, Chang Liu, Ze-Min Huang, Weizhen Jia, Yunkai Wang, Farshid Jafarpour, Yong Zhao, Xinan Chen, Jessie Shelton, Aaron W. Young, John Bartolotta, Wenchao Xu, Yue Sun, Anjun Chu, Victor Colussi, Chris Akers, Nathan Brooks, Wenbo Fu, Jinchao Zhao, Marvin Qi, Anqi Mu, Yubo Yang, Allen Zang, Yang Lyu, Peizhi Mai, Christopher Wilson, Xuefei Guo, Juntai Zhou, Daniel Inafuku, Chi Xue, Luyu Gao, Ze Yang, Yaïr Hein, Yonatan Kahn, Kevin Zhou, Di Luo, John Drew Wilson, Jarrod T. Reilly, Dmytro Bandak, Ofir Press, Liang Yang, Xueying Wang, Hao Tong, Nicolas Chia, Eliu Huerta, Hao Peng

AI总结 该研究提出了一项名为CritPt的新型基准测试,旨在评估大型语言模型在前沿物理研究中的推理能力。该基准包含71个复合研究挑战任务,覆盖凝聚态物理、量子物理、天体物理等多个领域,均由活跃的物理研究人员根据自身研究创建,并经过人工精心设计以确保答案可被机器验证。实验表明,当前最先进的语言模型在处理完整研究级任务时表现仍较为有限,揭示了现有模型能力与实际物理研究需求之间的显著差距。

Comments 40 pages, 6 figures, 6 tables

详情
英文摘要

While large language models (LLMs) with reasoning capabilities are progressing rapidly on high-school math competitions and coding, can they reason effectively through complex, open-ended challenges found in frontier physics research? And crucially, what kinds of reasoning tasks do physicists want LLMs to assist with? To address these questions, we present the CritPt (Complex Research using Integrated Thinking - Physics Test, pronounced "critical point"), the first benchmark designed to test LLMs on unpublished, research-level reasoning tasks that broadly covers modern physics research areas, including condensed matter, quantum physics, atomic, molecular & optical physics, astrophysics, high energy physics, mathematical physics, statistical physics, nuclear physics, nonlinear dynamics, fluid dynamics and biophysics. CritPt consists of 71 composite research challenges designed to simulate full-scale research projects at the entry level, which are also decomposed to 190 simpler checkpoint tasks for more fine-grained insights. All problems are newly created by 50+ active physics researchers based on their own research. Every problem is hand-curated to admit a guess-resistant and machine-verifiable answer and is evaluated by an automated grading pipeline heavily customized for advanced physics-specific output formats. We find that while current state-of-the-art LLMs show early promise on isolated checkpoints, they remain far from being able to reliably solve full research-scale challenges: the best average accuracy among base models is only 5.7%, achieved by GPT-5 (high), moderately rising to around 10% when equipped with coding tools. Through the realistic yet standardized evaluation offered by CritPt, we highlight a large disconnect between current model capabilities and realistic physics research demands, offering a foundation to guide the development of scientifically grounded AI tools.

2509.23668 2026-05-12 cs.LG

Hermes: A Multi-Scale Spatial-Temporal Hypergraph Network for Stock Time Series Forecasting

Xiangfei Qiu, Liu Yang, Xiangyu Xu, Hanyin Cheng, Xingjian Wu, Rongjia Wu, Zhigang Zhang, Ding Tu, Chenjuan Guo, Bin Yang, Christian S. Jensen, Jilin Hu

AI总结 Hermes 是一种用于股票时间序列预测的多尺度时空超图网络,旨在更深入地挖掘行业间的相关性以提高预测精度。该方法通过引入基于超边的移动聚合模块和跨尺度边到边信息传递机制,有效捕捉行业间的领先-滞后关系及多尺度信息。实验表明,Hermes 在多个真实股票数据集上优于现有先进方法。

详情
英文摘要

Time series forecasting occurs in a range of financial applications providing essential decision-making support to investors, regulatory institutions, and analysts. Unlike multivariate time series from other domains, stock time series exhibit industry correlation. Exploiting this kind of correlation can improve forecasting accuracy. However, existing methods based on hypergraphs can only capture industry correlation relatively superficially. These methods face two key limitations: they do not fully consider inter-industry lead-lag interactions, and they do not model multi-scale information within and among industries. This study proposes the Hermes framework for stock time series forecasting that aims to improve the exploitation of industry correlation by addressing these limitations. The framework integrates moving aggregation and multi-scale fusion modules in a hypergraph network. Specifically, to more flexibly capture the lead-lag relationships among industries, Hermes proposes a hyperedge-based moving aggregation module. This module incorporates a sliding window and utilizes dynamic temporal aggregation operations to consider lead-lag dependencies among industries. Additionally, to effectively model multi-scale information, Hermes employs cross-scale, edge-to-edge message passing to integrate information from different scales while maintaining the consistency of each scale. Experimental results on multiple real-world stock datasets show that Hermes outperforms existing state-of-the-art methods.

2509.20909 2026-05-12 cs.CL

LogitTrace: Detecting Benchmark Contamination via Layerwise Logit Trajectories

Zirui He, Haiyan Zhao, Yingcong Li, Ali Payani, Mengnan du

AI总结 本文提出了一种名为 LogitTrace 的新方法,用于检测大语言模型在基准测试中可能存在的“基准污染”问题。该方法通过分析模型各层的中间 logits 轨迹,揭示模型在决策过程中是否过早地依赖记忆而非推理。实验表明,LogitTrace 能有效区分被污染样本与正常样本,并且对输入的改写具有鲁棒性,为研究大语言模型的记忆行为提供了新的视角。

Comments 23pages, 10 figures, 9tables

详情
英文摘要

Large language models (LLMs) are commonly evaluated on challenging benchmarks such as AIME and Math500, where benchmark contamination can make memorized solutions appear as genuine reasoning. Existing detection methods largely rely on surface overlap, completion behavior, or final-output likelihood, and often degrade when inputs are simply rephrased. In this paper, we propose LogitTrace(Layerwise Logit Trajectories), a framework for analyzing memorization-like decision dynamics through intermediate logit trajectories. Instead of judging memorization only from the final answer, LogitTrace examines how model preferences emerge and stabilize across layers. We find that contaminated examples tend to show earlier commitment, while clean examples exhibit more gradual evidence accumulation. These trajectory signals allow a lightweight classifier to separate contaminated and clean examples across multiple models and input variants. Controlled LoRA injection experiments further show that repeated exposure to target samples induces similar trajectory patterns. Overall, our results suggest that LogitTrace provides evidence beyond surface overlap and final-output confidence, offering a useful lens for studying memorization-like behavior in LLMs.

2509.20599 2026-05-12 cs.LG cs.NA math.NA

Explicit and Effectively Symmetric Schemes for Neural SDEs on Lie Groups

Daniil Shmelev, Luke Thompson, Cristopher Salvi

AI总结 本文提出了一种适用于李群上神经随机微分方程(Neural SDEs)的显式且有效对称(EES)求解方法,解决了现有方法在内存效率和梯度精度之间的权衡问题。通过将EES方法从常微分方程推广到随机微分方程,并结合Bazavov的无交换子构造,该方法能够在李群和齐性空间上高效实现,从而首次实现了该类问题下的显式近似可逆积分器。实验表明,该方法在欧几里得空间和流形值问题上均表现出更高的稳定性与更低的内存消耗。

详情
英文摘要

Backpropagation through (neural) SDE solvers is traditionally approached in two ways: discretise-then-optimise, which offers accurate gradients but incurs prohibitive memory costs; and optimise-then-discretise, which achieves constant memory cost by solving an auxiliary backward SDE, but suffers from slower evaluation and gradient approximation errors. Algebraically reversible solvers promise both memory efficiency and gradient accuracy, yet existing methods such as Reversible Heun are often unstable under complex models and large step sizes, and their non-standard auxiliary-state structure obstructs extension to manifold-valued SDEs. Building on the recently introduced Explicit and Effectively Symmetric (EES) schemes - a class of stable, near-reversible explicit Runge--Kutta methods - we address both limitations of existing schemes. We extend EES schemes from ODEs to SDEs and show that they admit an efficient Williamson 2N-storage realisation. Bazavov's commutator-free construction then lifts these schemes to arbitrary Lie groups and homogeneous spaces. To our knowledge, this is the first explicit (near-)reversible integrator in this setting, unlocking the reversible adjoint approach for manifold-valued problems. On Euclidean neural SDE benchmarks, our schemes improve stability under stiff drift and large steps compared with other reversible solvers, while the commutator-free lift reduces memory by up to an order of magnitude on manifold-valued problems versus other baselines. These results establish effectively symmetric integration as a unified, geometry-aware foundation for memory-efficient and stable training of neural SDEs.

2509.19771 2026-05-12 cs.LG cs.AI

Frictional Q-Learning

Hyunwoo Kim, Hyo Kyung Lee

AI总结 本文研究了无策略强化学习中的外推误差问题,提出了一种名为“摩擦Q学习”的新算法。受静摩擦力的启发,作者将经验回放缓冲区视为一个低维的平滑动作流形,通过分解切向和法向分量来刻画动作支持度与价值敏感性的各向异性。该方法利用对比变分自编码器编码支持动作,并通过正交基分解进一步增强稳定性,实验表明其在连续控制任务中表现出更稳健的性能。

详情
英文摘要

Off-policy reinforcement learning suffers from extrapolation errors when a learned policy selects actions that are weakly supported in the replay buffer. In this study, we address this issue by drawing an analogy to static friction. From this perspective, the replay buffer is represented as a smooth, low-dimensional action manifold, where the support directions correspond to the tangential component, while the normal component captures the dominant first-order extrapolation error. This decomposition reveals an intrinsic anisotropy in value sensitivity that naturally induces a stability condition analogous to a friction threshold. To mitigate deviations toward unsupported actions, we propose Frictional Q-Learning, an off-policy algorithm that encodes supported actions as tangent directions using a contrastive variational autoencoder. We further show that an orthonormal basis of the orthogonal complement corresponds to normal components under mild local isometry assumptions. Extensive empirical results on standard continuous-control benchmarks consistently demonstrate robust and stable performance compared with competitive baselines.

2509.15519 2026-05-12 cs.LG

Fully Decentralized Cooperative Multi-Agent Reinforcement Learning is A Context Modeling Problem

Chao Li, Bingkun Bao, Yang Gao

AI总结 本文研究了全分布式合作多智能体强化学习问题,其中每个智能体仅能观测到自身状态、动作和共享奖励,无法获取其他智能体的动作,这导致价值函数更新时的非平稳性和价值估计时的相对过泛化问题,阻碍了有效合作策略的学习。为解决这一问题,本文提出了一种名为Dynamics-Aware Context(DAC)的新方法,将每个智能体感知到的局部任务建模为上下文马尔可夫决策过程,并通过动态感知的上下文建模同时处理非平稳性和相对过泛化问题。实验表明,DAC在多个合作任务中表现优异,验证了其有效性。

详情
英文摘要

This paper studies fully decentralized cooperative multi-agent reinforcement learning, where each agent solely observes the states, its local actions, and the shared rewards. The inability to access other agents' actions often leads to non-stationarity during value function updates and relative overgeneralization during value function estimation, hindering effective cooperative policy learning. However, existing works fail to address both issues simultaneously, due to their inability to model the joint policy of other agents in a fully decentralized setting. To overcome this limitation, we propose a novel method named Dynamics-Aware Context (DAC), which formalizes the task, as locally perceived by each agent, as an Contextual Markov Decision Process, and further addresses both non-stationarity and relative overgeneralization through dynamics-aware context modeling. Specifically, DAC attributes the non-stationary local task dynamics of each agent to switches between unobserved contexts, each corresponding to a distinct joint policy. Then, DAC models the step-wise dynamics distribution using latent variables and refers to them as contexts. For each agent, DAC introduces a context-based value function to address the non-stationarity issue during value function update. For value function estimation, an optimistic marginal value is derived to promote the selection of cooperative actions, thereby addressing the relative overgeneralization issue. Experimentally, we evaluate DAC on various cooperative tasks (including matrix game, predator and prey, and SMAC), and its superior performance against multiple baselines validates its effectiveness.

2509.14234 2026-05-12 cs.LG

Compute as Teacher: Turning Inference Compute Into Reference-Free Supervision

Dulhan Jayalath, Shashwat Goel, Thomas Foster, Parag Jain, Suchin Gururangan, Cheng Zhang, Anirudh Goyal, Alan Schelten

AI总结 在缺乏真实标签的后训练阶段,学习信号应如何获取?本文提出“Compute as Teacher(CaT)”框架,利用推理过程中的计算本身作为监督信号,无需人工标注即可进行强化学习训练,尤其适用于如医疗指导等无法验证的领域。该方法通过生成并行轨迹并将其转化为伪参考答案,再结合自定义评分规则生成奖励信号,有效提升了模型性能。实验表明,CaT在保持高质量的同时大幅降低了推理计算成本,并在多个基准任务中表现出色。

Comments Published as a conference paper at ICML 2026. 23 pages, 6 figures, 12 tables

详情
英文摘要

Where do learning signals come from when there is no ground truth in post-training? We show that inference compute itself can serve as supervision. By generating parallel rollouts and converting them into reference estimates, models can learn without human labels-critically, even in non-verifiable domains like healthcare guidance where no programmatic checker exists. We call this framework Compute as Teacher (CaT) and it turns inference-time compute from parallel rollouts into supervision for RL training. The framework has two components: (1) reference estimation which aggregates rollouts into a pseudo-reference answer, and (2) reward derivation which converts that pseudo-reference into RL rewards. For (1), we explore a simple method we call synthesis, but the framework admits any aggregator. For (2), we introduce self-proposed rubrics for non-verifiable domains. These are binary, auditable criteria generated from the pseudo-reference and scored by an LLM judge. On HealthBench, models trained with CaT match or exceed inference-time aggregation quality while using 9x less test-time compute. Here, CaT also competes with learning from expert physician annotations, yielding up to +30% relative improvement over the initial policy. The framework extends naturally to verifiable rewards, matching the best existing baselines on MATH-500 in test-time RL and demonstrating 'drop-in' versatility across both types of domains.

2509.12235 2026-05-12 cs.LG cs.AI

RL Fine-Tuning Heals OOD Forgetting in SFT

Hangzhan Jin, Sitao Luan, Tianwei Ni, Sicheng Lyu, Guillaume Rabusseau, Reihaneh Rabbany, Doina Precup, Mohammad Hamdaqa

AI总结 本文研究了监督微调(SFT)后使用强化学习(RL)对大语言模型进行微调的效果,揭示了SFT在提升模型在分布内(ID)推理能力的同时,往往会导致分布外(OOD)性能下降的问题。通过检查点分析和谱分析,发现RL并非单纯提升SFT的效果,而是恢复SFT后期丢失的OOD能力,并且这一过程与奇异向量的旋转有关。研究为理解后训练动态提供了新视角,并指出控制奇异向量旋转可能有助于提高模型的OOD鲁棒性。

Comments 31 pages, 22 figures

详情
英文摘要

Supervised Fine-Tuning (SFT) followed by Reinforcement Learning (RL) is a standard post-training recipe for improving Large Language Models (LLM) reasoning, but why it works remains unclear. We revisit the common claim that ``SFT memorizes, RL generalizes'' through checkpoint-wise analyses of in-distribution (ID) and out-of-distribution (OOD) reasoning. We find that OOD performance often peaks early during SFT and then declines despite continued improvement in ID reasoning. RL typically does not surpass this early SFT peak; rather, it restores OOD capability lost during later SFT, and only from a bounded range of SFT checkpoints. Further spectral analysis shows that this forgetting-and-recovery pattern correlates with rotations of singular vectors, while singular values remain largely stable. These findings suggest a more precise view of post-training dynamics: SFT can forget, RL can recover, and controlling singular-vector rotation may improve OOD robustness. Code is available at \href{https://github.com/jinhangzhan/RL\_Heals\_SFT.git}{https://github.com/jinhangzhan/RL\_Heals\_SFT}.

2509.08670 2026-05-12 cs.CV

FractalPINN-Flow: A Fractal-Inspired Network for Unsupervised Optical Flow Estimation with Total Variation Regularization

Sara Behnamian, Rasoul Khaksarinezhad, Andreas Langer

AI总结 本文提出了一种名为 FractalPINN-Flow 的无监督深度学习框架,用于从连续的灰度图像帧中直接估计稠密光流,无需真实标注数据。该方法的核心是受分形几何和自相似性启发的分形变形网络(FDN),通过递归的编码-解码结构和跳跃连接,同时捕捉细粒度细节和长距离运动模式。训练过程中结合了总变分(TV)正则化,通过最小化包含 $L^1$ 和 $L^2$ 数据保真项以及 TV 项的能量函数,以保证亮度一致性和平滑的光流场。实验表明,该模型在合成和基准数据集上均能生成准确、平滑且保持边缘的光流场,尤其适用于高分辨率和标注有限的场景。

详情
Journal ref
In Proceedings of the 2nd Sorbonne-Heidelberg Workshop on AI in Medicine: Machine Learning for Multi-modal Data, Heidelberg University Library, 2025
英文摘要

We present FractalPINN-Flow, an unsupervised deep learning framework for dense optical flow estimation that learns directly from consecutive grayscale frames without requiring ground truth. The architecture centers on the Fractal Deformation Network (FDN) - a recursive encoder-decoder inspired by fractal geometry and self-similarity. Unlike traditional CNNs with sequential downsampling, FDN uses repeated encoder-decoder nesting with skip connections to capture both fine-grained details and long-range motion patterns. The training objective is based on a classical variational formulation using total variation (TV) regularization. Specifically, we minimize an energy functional that combines $L^1$ and $L^2$ data fidelity terms to enforce brightness constancy, along with a TV term that promotes spatial smoothness and coherent flow fields. Experiments on synthetic and benchmark datasets show that FractalPINN-Flow produces accurate, smooth, and edge-preserving optical flow fields. The model is especially effective for high-resolution data and scenarios with limited annotations.

2508.20697 2026-05-12 cs.LG cs.CL

Token Buncher: Shielding LLMs from Harmful Reinforcement Learning Fine-Tuning

Weitao Feng, Lixu Wang, Peizhuo Lv, Tianyi Wei, Jie Zhang, Chongyang Gao, Sinong Zhan, Wei Dong

AI总结 随着大语言模型(LLMs)能力的提升,通过微调进行有害使用的风险也日益增加。已有研究多关注监督微调(SFT)带来的威胁,但本文系统地表明,强化学习(RL)在相同计算预算下能更有效地破坏模型的安全对齐,促进更高级的有害任务协助。为此,本文提出Token Buncher,这是首个专门针对基于RL的有害微调的有效防御方法,其通过抑制模型响应熵来削弱RL的依赖基础,从而阻止有害行为的演化,并在多个模型和RL算法上验证了其有效性与通用性。

Comments Project Hompage: https://tokenbuncher.github.io/

详情
英文摘要

As large language models (LLMs) continue to grow in capability, so do the risks of harmful misuse through fine-tuning. While most prior studies assume that attackers rely on supervised fine-tuning (SFT) for such misuse, we systematically demonstrate that reinforcement learning (RL) enables adversaries to more effectively break safety alignment and facilitate more advanced harmful task assistance, under matched computational budgets. To counter this emerging threat, we propose TokenBuncher, the first effective defense specifically targeting RL-based harmful fine-tuning. TokenBuncher suppresses the foundation on which RL relies: model response entropy. By constraining entropy, RL-based fine-tuning can no longer exploit distinct reward signals to drive the model toward harmful behaviors. We realize this defense through entropy-as-reward RL and a Token Noiser mechanism designed to prevent the escalation of harmful capabilities. Extensive experiments across multiple models and RL algorithms show that TokenBuncher robustly mitigates harmful RL fine-tuning while preserving benign task performance and finetunability. Our results highlight that RL-based harmful fine-tuning poses a greater systemic risk than SFT, and that TokenBuncher provides an effective and general defense.

2508.17497 2026-05-12 cs.LG cs.AI

Multimodal Representation Learning Conditioned on Semantic Relations

Yang Qiao, Yuntong Hu, Bowen Zhu, Hasibul Haque, Liang Zhao

AI总结 本文提出了一种基于语义关系的多模态表征学习框架RCML,旨在解决传统对比模型在不同语义关系下表征不变的问题。该方法将语义关系作为显式条件,通过自然语言描述的关系引导模型生成与关系相关的多模态表征,从而在同一样本在不同上下文中具有不同的表征形式。实验表明,RCML在多个数据集的检索和分类任务中均优于现有方法,验证了利用语义关系指导多模态学习的有效性。

详情
英文摘要

Multimodal representation learning has been largely driven by contrastive models such as CLIP, which learn a shared embedding space by aligning paired image-text samples. While effective for general-purpose representation learning, such models typically produce a single embedding per sample that is reused across different semantic relations and contexts. However, in many real-world applications, relevance between samples is inherently relation-dependent, with different semantic relations emphasizing different aspects of multimodal data. In this work, we propose Relation-Conditioned Multimodal Learning (RCML), a framework that treats semantic relations as explicit conditions of multimodal representation learning. Rather than producing relation-agnostic embeddings, RCML learns representations conditioned on natural-language relation descriptions, allowing the same sample to be represented differently under different relational contexts. The framework constructs relation-aware training pairs, introduces a relation-conditioned module to adapt embeddings to relation semantics, and employs a unified contrastive objective to jointly model cross-modal alignment and relation-induced inter-sample structure. Experiments on multiple datasets show that RCML consistently outperforms strong baselines on retrieval and classification tasks in zero-shot, fine-tuned, and out-of-domain settings, highlighting the effectiveness of leveraging semantic relations to guide multimodal representation learning.

2508.03829 2026-05-12 cs.CL cs.CR

Majority Bit-Aware Watermarking For Large Language Models

Jiahao Xu, Rui Hu, Olivera Kotevska, Zikai Zhang

AI总结 随着大型语言模型(LLM)的广泛应用,如何追踪其生成有害或欺骗性内容的问题日益突出。为此,研究提出了一种新的水印编码方法——多数位感知编码,通过放松对绿色词表大小的限制,在保持水印信号强度的同时提升生成文本的质量。该方法在多个先进LLM上进行了实验,结果显示其在解码准确率和文本质量方面均优于现有方法。

Comments Preprint

详情
英文摘要

The growing deployment of Large Language Models (LLMs) has raised concerns about their misuse in generating harmful or deceptive content. To address this issue, watermarking methods have been proposed to embed identifiable multi-bit messages into generated text for misuse tracing. However, existing methods often suffer from a fundamental trade-off between text quality and decoding accuracy. In particular, they have to restrict the size of the preferred token set (i.e., green list) during encoding to maintain a detectable watermark signal for decoding, which inevitably degrades generation quality. To improve this trade-off, we propose a novel message encoding paradigm called \textit{majority bit-aware encoding}, which relaxes the watermark signal strength from the green list size. This strategy allows for a strong watermark signal to be preserved in generated texts even when using a large green list. We introduce two instantiations of this paradigm: MajorMark and MajorMark$^{+}$, where the latter is specifically optimized for long messages. Extensive experiments on state-of-the-art LLMs demonstrate that our methods achieve higher decoding accuracy and superior text quality compared to prior baselines.

2508.01191 2026-05-12 cs.AI cs.CL cs.LG

Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

Chengshuai Zhao, Zhen Tan, Pingchuan Ma, Dawei Li, Bohan Jiang, Yancheng Wang, Yingzhen Yang, Huan Liu

AI总结 本文从数据分布的角度探讨了大型语言模型中思维链(CoT)推理的有效性,提出CoT推理本质上是模型从训练数据中学到的结构化归纳偏置,其效果受训练数据与测试任务之间分布差异的影响。研究通过构建可控实验环境DataAlchemy,系统分析了CoT推理在不同任务、长度和格式下的表现,揭示了当任务超出训练分布时,CoT推理的脆弱性,强调了实现真正通用推理的挑战。

Comments Accepted by the Association for Computational Linguistics (ACL) 2026 and Foundations of Reasoning in Language Models (FoRLM) at NeurIPS 2025

详情
英文摘要

Chain-of-Thought (CoT) prompting has been shown to be effective in eliciting structured reasoning (i.e., CoT reasoning) from large language models (LLMs). Regardless of its popularity, recent studies expose its failures in some reasoning tasks, raising fundamental questions about the nature of CoT reasoning. In this work, we propose a data distribution lens to understand when and why CoT reasoning succeeds or fails. We hypothesize that CoT reasoning reflects a structured inductive bias learned from in-distribution data, enabling models to conditionally generate reasoning trajectories that approximate those observed during training. As such, the effectiveness of CoT reasoning is fundamentally governed by the nature and degree of distribution discrepancy between training data and test queries. Guided by this lens, we dissect CoT reasoning via three dimensions: task, length, and format. To test the hypothesis, we introduce DataAlchemy, an abstract and fully controllable environment that trains LLMs from scratch and systematically probes them under various distribution conditions. Through rigorous controlled experiments, we reveal that CoT reasoning is a brittle mirage when it is pushed beyond training distributions, emphasizing the ongoing challenge of achieving genuine and generalizable reasoning.