arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 4056
2605.09352 2026-05-12 cs.AI

The Wittgensteinian Representation Hypothesis: Is Language the Attractor of Multimodal Convergence?

Zhaoyang Zhang, Run Shao, Dongyue Wu, Jiajie Teng, Chao Tao, Jingdong Chen, Haifeng Li

AI总结 本文探讨了为何不同模态的独立训练神经网络会收敛到共享表示,并研究了这一收敛的方向性。作者提出了一种基于循环k近邻的定向收敛分析方法,发现非语言模态更倾向于向语言表示的结构靠拢,这一现象在多种模型和尺度下均成立。研究进一步指出,语言表示在表征空间中占据更紧凑的区域,信息瓶颈理论为此提供了理论解释,最终提出了“维特根斯坦表征假设”:语言的语义结构是多模态表征收敛的渐近吸引子。

Comments 22 pages, 11 figures, 6 tables

详情
英文摘要

Understanding why independently trained neural networks from different modalities converge toward shared representations, and where this convergence leads, remains an open question in representation learning. All existing evidence relies on symmetric similarity measures, which can detect convergence but are structurally blind to its direction. We introduce directional convergence analysis using cycle-kNN, an asymmetric alignment measure, applied across dozens of independently trained unimodal models spanning point clouds, vision, and language. We uncover a consistent directional asymmetry: non-language modalities move toward the neighborhood structure of language significantly more than the reverse, and this pattern holds across all model families and scales--yet is entirely invisible to symmetric measures. Mechanistic analysis traces the directionality to feature density asymmetry, whereby language representations occupy the most compact regions of representational space. The Information Bottleneck framework provides a principled interpretation: optimization under compression drives representations toward discrete, compositional structures characteristic of language. We formalize this as the Wittgensteinian Representation Hypothesis: the semantic structure of language is the asymptotic attractor of multimodal representation convergence.

2605.09350 2026-05-12 cs.AI

CHAINTRIX: A multi-pipeline LLM-augmented framework for automated smart-contract security auditing

Gabriela Dobrita, Simona-Vasilica Oprea, Adela Bara

AI总结 智能合约漏洞已导致数十亿美元的损失,但安全审计仍存在成本高、效率低的问题。为解决这一问题,本文提出 Chaintrix,一个结合多管道和大语言模型的自动化智能合约安全审计框架,其核心在于将所有大模型生成的检测结果与确定性的合约结构表示进行比对,以提升准确性。该框架引入了跨合约交互模型(CCIM)对 Solidity 代码进行结构化解析,并通过多阶段的误报过滤机制与结构化验证引擎,显著提升了检测效果,在多个基准测试中表现出色,高危漏洞召回率达71.7%,优于当前最先进的模型基线。

详情
英文摘要

Smart-contract exploits have caused billions of USD in cumulative losses, yet audits remain expensive and slow. Automated tools have emerged to close this gap, but each class has a characteristic failure mode. Static analyzers report findings that frequently fail manual triage at high rates, while large language models (LLMs) hallucinate findings that contradict the source code. Thus, we propose Chaintrix, an end-to-end auditing framework whose central architectural commitment is that every LLM-generated claim must be discharged against a deterministic structural contract representation. We introduce a Cross-Contract Interaction Model (CCIM) that parses Solidity into a structured map of function-level reads, writes, modifiers and resolved cross-contract calls. CCIM serves as the substrate against which all 12 of Chaintrix's deterministic signal engines and the parallel LLM audit pipelines operate. A staged false-positive-reduction pipeline, terminating in a Structural Verdict Engine (SVE) that applies deterministic structural checks against parsed code, filters the merged finding set, with selected high-confidence findings further validated through symbolic execution and fuzz testing. We evaluate Chaintrix on EVMbench, the smart-contract security benchmark by OpenAI, Paradigm, OtterSec. Chaintrix detects 86 of 120 high-severity vulnerabilities (71.7% recall), with 25 audits scoring 100% recall, placing Chaintrix 26 percentage points above the strongest frontier-model baseline.

2605.09348 2026-05-12 cs.CL cs.AI cs.DB cs.MM

HOME-KGQA: A Benchmark Dataset for Multimodal Knowledge Graph Question Answering on Household Daily Activities

Shusaku Egami, Aoi Ohta, Tomoki Tsujimura, Masaki Asada, Tatsuya Ishigaki, Ken Fukuda, Masahiro Hamasaki, Hiroya Takamura

AI总结 本文提出HOME-KGQA,一个用于家庭日常活动的多模态知识图谱问答新基准数据集。该数据集基于多模态知识图谱构建,包含复杂的多跳自然语言问题及对应的图数据库查询语言,涵盖了多层级时空推理和多模态对齐等更具挑战性的任务。实验表明,现有基于大语言模型的KGQA方法在该数据集上的表现显著下降,突显了现实场景中KGQA系统仍面临诸多挑战。

Comments 12 pages, 4 figures, 7 tables, accepted at LREC2026

详情
英文摘要

Large Language Models (LLMs) provide flexible natural language processing capabilities, while knowledge graphs (KGs) offer explicit and structured knowledge. Integrating these two in a complementary manner enables the development of reliable and verifiable AI systems. In particular, knowledge graph question answering (KGQA) has attracted attention as a means to reduce LLM hallucinations and to leverage knowledge beyond the training data. However, existing KGQA benchmark datasets are biased toward encyclopedic knowledge, limited to a single modality, and lack fine-grained spatiotemporal data, which limits their applicability to real-world scenarios targeted by Embodied AI. We introduce HOME-KGQA, a novel KGQA benchmark dataset built on a multimodal KG of daily household activities. HOME-KGQA consists of complex, multi-hop natural language questions paired with graph database query languages. Compared to existing benchmarks, it includes more challenging questions that involve multi-level spatiotemporal reasoning, multimodal grounding, and aggregate functions. Experimental results show that the LLM-based KGQA methods fail to achieve performance comparable to that on existing datasets when evaluated on HOME-KGQA. This highlights significant challenges that should be addressed for the real-world deployment of KGQA systems. Our dataset is available at https://github.com/aistairc/home-kgqa

2605.09347 2026-05-12 cs.AI cs.LO

Dsat: A Native SAT Solver for Discrete Logic

Yaofang Zhang, Ken Zhou, Adnan Darwiche

AI总结 本文提出了一种专为离散逻辑设计的原生SAT求解器Dsat,用于处理变量可取任意离散值的逻辑问题,避免了传统将离散变量二值化为布尔变量的方法所带来的计算和语义挑战。该求解器在设计上借鉴了布尔SAT求解器的机制,如单元归结和子句学习,但直接在离散变量上运行,从而更高效地处理离散逻辑公式。实验表明,Dsat在解决离散CNF问题时相比传统方法具有明显优势。

Comments To Appear at The International Conferences on Theory and Applications of Satisfiability Testing (SAT), 2026

详情
英文摘要

Discrete variables are common in many applications, such as probabilistic reasoning, planning and explainable AI. When symbolic reasoning techniques are brought in to bear on these applications, a standard technique for handling discrete variables is to binarize them into Boolean variables to allow the use of Boolean computational machinery such as SAT solvers. This technique can face both computational and semantical challenges though. In this work, we develop a native SAT solver for discrete logic, which is a direct extension of Boolean logic in which variables can take arbitrary values. Our proposed solver has a similar design to Boolean SAT solvers, with ingredients such as unit resolution and clause learning but ones that operate natively on discrete variables. We illustrate the merits of the developed SAT solver by comparing it empirically to CSP solvers applied to discrete CNFs, to Boolean SAT solver applied to binarized CNFs, and to some hybrid solvers.

2605.09346 2026-05-12 cs.CL cs.AI

RuPLaR : Efficient Latent Compression of LLM Reasoning Chains with Rule-Based Priors From Multi-Step to One-Step

Xiaocheng Luo, Kang Wang, Zaifu Zhan, Yuechi Zhou, Xiangyu Duan

AI总结 本文提出了一种名为 RuPLaR 的新型压缩框架,旨在解决潜空间推理(latent CoT)中多步骤或多模型范式带来的结构复杂性问题。该方法通过引入基于规则的先验分布,引导大语言模型在单一训练阶段自主生成潜空间推理标记,从而消除级联过程和模型间依赖。实验表明,RuPLaR 在保持推理质量的同时显著提升了准确率,并大幅减少了所需标记数量,展现出良好的有效性和可扩展性。

Comments 15 pages, 15 figures

详情
英文摘要

The Chain-of-Thought (CoT) paradigm, while enhancing the interpretability of Large Language Models (LLMs), is constrained by the inefficiencies and expressive limits of natural language. Latent Chain-of-Thought (latent CoT) reasoning, which operates in a continuous latent space, offers a promising alternative but faces challenges from structural complexities in existing multi-step or multi-model paradigms, such as error propagation and coordination overhead. In this paper, we introduce One-Model One-Step, a novel compression framework for Latent Reasoning with Rule-Based Priors(RuPLaR) to address this challenge. Our method trains an LLM to autonomously generate latent reasoning tokens in a single training stage, guided by rule-based prior probability distributions, thereby eliminating cascaded processes and inter-model dependencies. To ensure reasoning quality, we design a joint training objective that enforces answer consistency via cross-entropy, aligns soft tokens with rule-based priors via KL divergence (the Soft Thinking constraint), and adds a problem-thought semantic alignment constraint in the representation space. Extensive experiments show that our compression framework not only improves accuracy by 11.1% over existing latent CoT methods but also achieves this with minimal token usage, underscoring its effectiveness and extensibility. Code: https://github.com/xiaocen-luo/RuPLaR.

2605.09345 2026-05-12 cs.LG

Selection Plateau and a Sparsity-Dependent Hierarchy of Pruning Features

Guangqi Li, Yongxin Li

AI总结 本文研究了一次性神经网络剪枝中的“选择平台”现象,发现所有单调秩权重评分方法在固定稀疏度下会收敛到相同的准确率,与具体形式无关。作者提出了稀疏度-信息-复杂度光谱(SICS)假说,指出不同稀疏度下需要不同复杂度的特征来突破平台,且特征复杂度需与目标稀疏度匹配。实验表明,非单调特征在中等稀疏度下能显著提升剪枝效果,而仅靠梯度或简单高斯特征则效果有限,说明特征复杂度和秩对齐对剪枝性能至关重要。

Comments 22 pages, 3 figures, 5 tables. Empirical study + framework hypothesis on ViT-Small/CIFAR-10. Cross-domain validation (vision token pruning, KV cache compression, MoE routing) and cross-architecture extensions deferred to follow-up work

详情
英文摘要

We identify a Selection Plateau phenomenon in one-shot neural network pruning: all rank-monotone weight scorers converge to identical accuracy at fixed sparsity, independent of functional form. We propose the Sparsity-Information-Complexity Spectrum (SICS) hypothesis: a sparsity-dependent minimum feature complexity kappa(S) governs plateau escape, with kappa=0 sufficient at low sparsity (S<0.65), kappa=1 dominant at critical sparsity (S~0.7), and kappa=2 necessary at extreme sparsity (S>0.75). On ViT-Small/CIFAR-10, testing nine feature classes across four sparsities, smooth non-monotone features provide +6.6% escape at S=0.7, while only raw features with high-frequency wiggle escape at S=0.8 (+2.6%). A fake non-monotone scorer underperforms the gradient baseline, indicating the requirement is magnitude-independent non-monotonicity. A handcrafted Gaussian bump achieves only +0.006 escape vs. chaos-derived +0.046, indicating rank-alignment is necessary but insufficient. SICS provides a unifying explanation for the performance clustering of diverse pruning methods and suggests that future selection algorithms should adapt feature complexity to target sparsity.

2605.09344 2026-05-12 cs.RO cs.MA

PECMAN: Perception-enabled Collaborative Multi-Agent Navigation in Unknown Environments

Tianchonghui Fang, Shaunak Roy, Shalabh Gupta

AI总结 该研究针对未知动态环境中多智能体协作导航的问题,提出了一种基于感知增强的协同导航方法PECMAN。该方法通过分布式树形结构重构和共享感知策略,使每个智能体能够实时响应环境变化并调整路径,同时将新发现的信息广播给其他智能体,提升整体协同效率。实验表明,PECMAN在多个场景中显著降低了团队完成时间,同时保持了高成功率。

详情
英文摘要

Most path planners assume fully known, static environments, assumptions that fail when robots navigate in dynamic and partially observable environments. SMART-3D addresses these issues by real-time replanning, where it morphs the underlying RRT* tree whenever new obstacles or structures are discovered in the environment. Instead of rebuilding the tree entirely from scratch, SMART-3D prunes invalid nodes and edges and subsequently repairs the disjoint subtrees at hot-nodes to find a new path, thus providing high computational efficiency for real-time adaptability. We extend SMART-3D to perception-enabled collaborative multi-agent navigation (PECMAN) in unknown environments. PECMAN is built upon distributed tree morphing and shared perception strategies, where each agent reacts to environmental changes and morphs its respective tree to replan its path, while simultaneously broadcasting newly discovered structures to other agents, thus enabling them to proactively replan even in areas that have not yet been explored by them. This approach reduces redundant reactions and unnecessary replannings of the agents due to improved situational awareness. The performance of PECMAN was evaluated by 28,000 multi-agent simulations on seven 2D scenarios with different case studies. The results show that PECMAN achieves up to 52% reduction in the team-completion time, while maintaining near 100% success rates. Finally, PECMAN was tested by real experiments on two autonomous robots in a building environment.

2605.09343 2026-05-12 cs.AI

SKG-VLA: Scene Knowledge Graph Priors for Structured Scene Semantics and Multimodal Reasoning for Decision Making

Zeyu Li, Lei Li

AI总结 在大规模投诉处理系统中,决策日益依赖于多源异构证据,如投诉叙述、截图、订单元数据等。为解决现有系统对场景结构、规则知识和跨证据依赖利用不足的问题,本文提出SKG-VLA方法,通过构建场景知识图(SKG)来统一表示投诉场景中的实体、证据、政策条款及关系,并基于该图谱设计数据合成流程和三阶段训练策略,以增强模型的结构化语义理解和多模态决策能力。实验表明,SKG-VLA在政策驱动推理、投诉决策准确性及鲁棒性方面均有显著提升。

详情
英文摘要

Decision making in large-scale complaint handling systems increasingly relies on heterogeneous evidence, including complaint narratives, screenshots, order metadata, historical interactions, and platform policies. Existing complaint understanding systems mainly perform shallow classification or template matching over isolated modalities, while underutilizing explicit scene structure, rule knowledge, and cross-evidence dependencies. To address this limitation, we present SKG-VLA for multimodal complaint decision making. The core idea is to model each case as a structured complaint scene and represent its decision-relevant semantics with a \emph{Scene Knowledge Graph} (SKG), which organizes complaint entities, evidence items, policy clauses, temporal events, transactional states, and action-relevant relations into a unified graph. Based on SKG, we build a data synthesis pipeline that generates complaint scene descriptions, rule-consistent graph generalizations, question-answer supervision, and decision recommendations. We further construct a large-scale complaint scene dataset with both text-only and multimodal in-domain benchmarks. Finally, we adopt a three-stage training strategy -- domain-adaptive pre-training, task-oriented instruction fine-tuning, and end-to-end multimodal alignment -- to inject structured scene priors into a multimodal decision model. Experiments show that SKG-VLA consistently improves policy-grounded reasoning, complaint decision accuracy, long-tail generalization, and robustness under incomplete evidence.

2605.09339 2026-05-12 cs.CV cs.AI

Perceptual Asymmetry Between Hue Categories: Evidence from Human Color Categorization

Elnara Kadyrgali, Nuray Toganas, Muragul Muratbekova, Pakizar Shamoi

AI总结 人类颜色类别在感知空间中并非均匀分布,但大多数计算颜色模型仍假设颜色表示是固定且均匀的。本文通过分析大规模人类颜色分类数据,扩展了COLIBRI模糊颜色模型,引入了基于模糊隶属函数的定量指标,揭示了色相类别间的感知不对称性。研究发现,黄色类别在色相空间中占据紧凑且明确的区域,而绿色类别则覆盖更广的区间并具有更长的过渡结构,表明人类颜色类别不仅具有模糊性,其几何组织也高度不均匀,为语言颜色分类和感知驱动的颜色建模提供了新的视角。

Comments The paper has been submitted for consideration to ICICS 2026 (International Conference on Informatics and Computer Science)

详情
英文摘要

Human color categories are not uniformly distributed in perceptual space, yet most computational color models still assume fixed and evenly structured representations. In this paper, we present a focused analytical extension of the COLIBRI fuzzy color model by investigating perceptual asymmetry between hue categories. Using previously collected large-scale human color categorization data, we introduce quantitative measures of category extent and boundary uncertainty, namely Wideness and Boundary Width, derived from fuzzy membership functions at the α = 0.5 level. The analysis reveals a strong imbalance between the two categories: yellow occupies a compact and sharply constrained region of the hue space, whereas green spans a substantially broader interval and exhibits a more extended transition structure. The results show that perceptual color categories are not only fuzzy, but also highly non-uniform in their geometric organization. This asymmetry suggests that some categories behave as narrow, highly specific perceptual labels, while others function as broad, tolerant regions of human color naming. These findings provide a new perspective on linguistic color categorization and extend the interpretability of the COLIBRI framework for perceptually grounded color modeling.

2605.09337 2026-05-12 cs.LG math.OC

Adversary-Robust Learning from Fully Asynchronous Directional Derivative Estimates

Anik Kumar Paul, Nibedita Roy, Nagesh Talagani, Swetha Ganesh, Gugan Thoppe, Alexandre Reiffers-Masson

AI总结 本文提出了一种名为 FAR-SIGN 的异步优化算法,用于在参数服务器-工作节点系统中实现对抗鲁棒学习。该方法通过沿精心设计的方向进行符号梯度更新,并结合双时间尺度机制减少偏差,从而提高鲁棒性。FAR-SIGN 支持一阶和零阶实现,无需服务器端的私有参考数据集,且支持完全异步执行。理论分析表明其几乎必然收敛于光滑非凸目标函数的平稳点,并在实验中表现出优于现有鲁棒聚合方法的准确率和运行效率。

详情
英文摘要

We propose FAR-SIGN (Fully Asynchronous Robust optimization via SIGNed directional projections) for adversary-resilient learning in parameter-server--worker systems. FAR-SIGN achieves robustness through sign-based updates along carefully designed directions and mitigates the resulting bias via a two-timescale mechanism. It admits both first-order and zeroth-order implementations and enables fully asynchronous execution without requiring a private reference dataset at the server. We establish almost-sure convergence of FAR-SIGN to the set of stationary points for smooth, nonconvex objectives. Moreover, we prove the near-optimal rate of $O(n^{-1/4+ε})$ in the first-order setting and the standard $O(n^{-1/6+ε})$ in the zeroth-order setting, where $n$ is the iteration count and $ε>0$ can be chosen arbitrarily small. Experiments on MNIST show that FAR-SIGN outperforms robust aggregation-based methods in both accuracy and wall-clock time.

2605.09335 2026-05-12 cs.LG

Functional Graphs for Predicting and Explaining Goal Failure in Sparse Goal-Conditioned RL

Shalley Dash

AI总结 该研究探讨了稀疏目标条件强化学习中策略失败的问题,提出通过确定性功能图分析策略行为,揭示出策略中的吸引子和流域结构。研究定义了局部目标支持(LGS)作为衡量策略在局部范围内能否成功达到目标的指标,并发现LGS可以有效诊断目标失败。进一步引入了策略诱导图的分类方法,以识别超出局部支持范围的失败模式,为理解稀疏目标条件强化学习中的失败提供了结构化分析工具。

Comments 9 pages main, 21 pages appendx, 2 figures in main. 8 figures in appendix, Submitted to a conference

详情
英文摘要

Sparse goal-conditioned reinforcement learning can produce policies whose failures are hidden by aggregate success rates. We analyze trained goal-conditioned value policies through the deterministic functional graphs induced by greedy evaluation: for each goal, every state maps to a single successor, decomposing behavior into attractors and basins. This reveals a local-to-global structure in learned policies. We define local goal support (LGS), a one-step statistic measuring the fraction of valid neighboring states whose greedy successor is the goal. In deterministic sparse GridWorlds, zero LGS exactly precludes goal entry from non-goal starts. Empirically, weak LGS is a strong diagnostic of goal-level failure across update rules, curricula, larger grids, and bottleneck geometries: the fixed rule LGS <= 0.5 identifies low-success goals with precision 0.921, recall 0.929, and F1 0.925 in the main 8x8 TD setting, with similar performance across variants. However, local support is not sufficient for global success: some supported goals still fail because distant states are captured by competing attractors or fragmented basin structure. We therefore introduce a compact post-hoc taxonomy of policy-induced graphs -- goal-dominant, competitor-dominated, partial/contested, and fragmented -- to characterize residual failure modes beyond local support. These results show that sparse GCRL failures can be understood as structured policy-induced dynamics, and that local one-step policy structure provides a cheap post-training diagnostic for goal-level failure.

2605.09331 2026-05-12 cs.LG

Dimension-Free Saddle-Point Escape in Muon

Yanlin Long, Yufei Gu, Zeke Xie

AI总结 本文研究了现代大语言模型训练中因高维平坦马鞍点导致的优化瓶颈问题,分析了新兴优化器Muon在逃离马鞍点的动力学特性。通过扩展广义矩阵扰动理论,提出了一种理论框架,证明Muon通过非线性谱塑形机制有效规避了维度诅咒,实现了维度无关的马鞍点逃离。该方法避免了同向噪声假设和Tracy-Widom边缘奇异性,为非凸优化动力学提供了严格的数学分析和逃逸界限。

Comments 33 pages, 5 figures. Preprint

详情
英文摘要

Modern Large Language Model (LLM) training is fundamentally bottlenecked by pathologically flat saddle points in extreme high-dimensional landscapes. Motivated by this challenge, we analyze the saddle-point escape dynamics of the emerging Muon optimizer, demonstrating its resilience against the $\mathcal{O}(D)$ dimensional curse that severely traps element-wise adaptive optimizers like AdamW. By extending generalized matrix perturbation theory, we develop a theoretical framework to capture Muon's non-equilibrium optimization trajectories. This theoretical machinery mathematically proves that Muon elegantly bypasses the dimensional curse via a non-linear spectral shaping mechanism. By leveraging resolvent functional calculus and macroscopic Cauchy contour integration, we avoid isotropic noise assumptions and Tracy-Widom edge singularities. We establish that structural incoherence securely shields the trajectory from orthogonal drift, enabling a dimension-free saddle-point escape, and triggering a deterministic $\mathcal{O}(1)$ discrete ballistic ejection under sufficient spectral gap. Consequently, we provide an algebraically dimension-free escape bound for Muon, formalizing the underlying mechanics of its non-convex optimization dynamics.

2605.09330 2026-05-12 cs.LG cs.AI

The Trap of Trajectory: Towards Understanding and Mitigating Spurious Correlations in Agentic Memory

Luoxi Tang, Rupali Rajendra Vaje, Yuqiao Meng, Sakshi Sunil Narkar, Weicheng Ma, Zeyu Ding, Dazheng Zhang, Zhaohan Xi

AI总结 该论文研究了智能体记忆(Agentic Memory)中因错误关联导致的推理偏差问题,指出在长期记忆中检索到的信息可能包含误导性证据,从而影响后续决策的准确性。为解决这一问题,研究者提出了CAMEL方法,通过在记忆写入和检索阶段进行校准,有效减少了对虚假关联的依赖,同时保持了模型在正常输入上的性能,并在对抗性攻击下仍表现出鲁棒性。这一方法为构建更可靠、更安全的智能体记忆系统提供了实用的解决方案。

详情
英文摘要

Agentic memory enables LLMs to persist information beyond a single context window and reuse it in later decisions, but it also introduces a new vulnerability: spurious correlations, where retrieved memory carries miscorrelated evidence and propagates erroneous reasoning into downstream decisions. Despite the widespread use of agentic memory, this risk remains largely underexplored. We address it from two aspects. First, we benchmark several canonical types of spurious patterns identified through causal structure and record them across trajectory-level memory. Diagnosing agentic memory systems on this benchmark reveals that memory improves reasoning on clean inputs but amplifies reliance on spurious patterns when they are present. Second, we propose CAMEL, a plug-and-play calibration method that operates across diverse memory architectures at both write and retrieval time. CAMEL consistently reduces reliance on spurious patterns across all three types while preserving or improving performance on clean inputs and staying robust under adaptive attacks targeting the calibration. Overall, CAMEL offers a principled and lightweight solution toward more reliable agentic memory deployment.

2605.09328 2026-05-12 cs.CV

Noise-Started One-Step Real-World Super-Resolution via LR-Conditioned SplitMeanFlow and GAN Refinement

Wei Zhu, Kai Zhang, Yu Zheng, Lei Luo, Yong Guo, Jian Yang

AI总结 该研究提出了一种基于扩散模型的单步真实世界图像超分辨率方法SMFSR,旨在解决传统扩散模型在效率与质量之间的矛盾。该方法在保持噪声起始生成过程的基础上,通过LR条件下的SplitMeanFlow实现从噪声到高分辨率图像的直接映射,并引入GAN优化阶段提升细节真实感和图像自然度。实验表明,SMFSR在保持高效单步推理的同时,达到了当前单步扩散模型在真实世界超分辨率任务中的最优感知质量。

详情
英文摘要

Pre-trained text-to-image (T2I) diffusion models have shown strong potential for real-world image super-resolution (Real-ISR), owing to their noise-started generation process that enables realistic texture synthesis and captures the one-to-many nature of super-resolution. However, diffusion-based Real-ISR methods still face a fundamental efficiency-quality trade-off. Multi-step methods generate high-quality results by iteratively denoising random Gaussian noise under LR conditioning, but suffer from slow sampling. Recent one-step methods greatly improve efficiency, yet they typically replace noise-started generation with direct LR-to-HR restoration, which weakens stochasticity and limits realistic detail synthesis. To address this issue, we propose SMFSR, a noise-started one-step Real-ISR framework via LR-conditioned SplitMeanFlow and GAN refinement. SMFSR preserves the random-noise starting point of diffusion models and learns a direct noise-to-HR mapping conditioned on the LR image. To this end, Interval Splitting Consistency distills the multi-step generative trajectory into a single average-velocity prediction, enabling efficient one-step generation. To compensate for the reduced opportunity for progressive refinement, we further introduce a GAN refinement stage, where a DINOv3-based discriminator enhances realistic texture synthesis and variational score distillation aligns the generated outputs with the natural image distribution under a frozen diffusion teacher. Extensive experiments demonstrate that SMFSR achieves state-of-the-art perceptual quality among one-step diffusion-based Real-ISR methods while retaining fast single-step inference.

2605.09319 2026-05-12 cs.CV cs.LG

PGID: Progressive Guided Inversion and Denoising for Robust Watermark Detection

Minh Quoc Duong, Chun Tong Lei, Chun Pong Lau

AI总结 随着AI生成图像的普及,数字水印技术成为保护知识产权和防止恶意利用的重要手段。然而,现有的语义水印方法依赖扩散模型逆过程进行水印检测,容易受到印痕移除和伪造攻击的影响。本文提出了一种名为PGID的渐进引导逆过程与去噪框架,无需训练即可有效防御这些攻击,通过逐步逆过程和去噪循环将扰动的潜在变量投影回其原始区域,从而恢复被移除的水印并识别伪造实例。

详情
英文摘要

With the proliferation of AI-generated images, digital watermarking has become an essential safeguard for protecting intellectual property and mitigating malicious exploitation. Recent works on semantic watermarking have enabled efficient copyright protection for diffusion models. However, the dependence of semantic watermarking on diffusion inversion for watermark detection creates a critical vulnerability. Imprint removal and forgery attacks exploit this weakness to produce deceptive results. Our analysis reveals that these attacks succeed by displacing watermarked latents into the unwatermarked region, while guiding unwatermarked latents into the watermarked region. Based on that, we propose Progressive Guided Inversion and Denoising (PGID), the first plug-and-play, training-free noise extraction framework designed to defend against both attack strategies. PGID effectively defends by projecting perturbed latents back to the region where they originally belong. The projection is achieved by eliminating intermediate latent deflections and mitigating adversarial perturbations through progressive inversion-denoising cycles. Comprehensive evaluations across multiple schemes demonstrate that PGID successfully restores detection reliability by recovering removed watermarks and identifying forged instances.

2605.09317 2026-05-12 cs.CL cs.CV cs.LG

Mem-W: Latent Memory-Native GUI Agents

Guibin Zhang, Yaohui Ling, Fanci Meng, Kun Wang, Shuicheng Yan

AI总结 本文提出了一种名为 Mem-W 的新型 GUI 智能体,其核心在于将记忆作为智能体连续上下文的一部分,而非传统的外部辅助结构。通过一个共享的轨迹到潜空间压缩器,Mem-W 将历史轨迹和当前会话片段编码为紧凑的记忆标记,并将其与当前 GUI 观测融合为连续的嵌入序列,从而实现对任务进展的统一感知与决策。实验表明,Mem-W 在多个网页和移动端导航任务中显著提升了多种基础模型和增强记忆方法的性能,最高提升达 30.0%,展示了潜空间原生记忆在长时程 GUI 操作中的有效性与扩展性。

详情
英文摘要

GUI agents are beginning to operate the web, mobile, and desktop as interactive worlds, where successful control depends on carrying forward visual, procedural, and task-level evidence beyond the fleeting present screen. Yet most agents still treat memory as an external, human-readable artifact: histories are summarized, categorized, retrieved, and reinserted as text or structured records before being encoded again by the policy. This creates a mismatch between the representational form in which experience is stored and the latent embedding sequence over which modern GUI policies actually act. We introduce Mem-W, a series of latent-memory-native GUI agents that treat memory as part of the agent's continuous context rather than as an auxiliary symbolic scaffold. Mem-W weaves both historical trajectories (as experiential memory) and in-session segments (as working memory) into compact memory tokens through a shared trajectory-to-latent compressor. These tokens are woven with the current GUI observation and local context into one continuous embedding sequence, allowing the agent to read successes, failures, and unfinished progress through the same machine-native interface. Mem-W is trained with self-distillation and outcome-aware supervision to preserve decision-relevant state while filtering memory toward evidence that truly supports task success. Across four web and mobile navigation benchmarks, Mem-W consistently improves diverse backbones and memory-enhanced baselines, with gains of up to $+30.0$, suggesting that latent-context-native memory can serve as a scalable foundation for long-horizon GUI agency.

2605.09315 2026-05-12 cs.AI cs.CL

Do Self-Evolving Agents Forget? Capability Degradation and Preservation in Lifelong LLM Agent Adaptation

Ye Yu, Xiaopeng Yuan, Haibo Jin, Heming Liu, Yaoning Yu, Haohan Wang

AI总结 本文研究了大型语言模型代理在持续适应新任务过程中出现的能力退化问题,指出在工作流、技能、模型和记忆等多个进化维度上,自我演化可能导致已习得能力的逐步丧失。为此,作者提出了能力保持演化(CPE)方法,通过约束演化过程中的破坏性能力漂移,在保持适应性能的同时提升已有能力的稳定性。实验表明,CPE在多个任务场景下有效缓解了能力退化,为构建稳定、长期自我演化的智能代理提供了新思路。

详情
英文摘要

Recent advances in LLM agents enable systems that autonomously refine workflows, accumulate reusable skills, self-train their underlying models, and maintain persistent memory. However, we show that such self-evolution is often non-monotonic: adapting to new task distributions can progressively degrade previously acquired capabilities across all major evolution channels. We identify this phenomenon as \emph{capability erosion under self-evolution} and show that it consistently emerges across workflow, skill, model, and memory evolution. To mitigate this issue, we propose \emph{Capability-Preserving Evolution} (CPE), a general stabilization principle that constrains destructive capability drift during continual adaptation. Across all four evolution dimensions, CPE consistently improves retained capability stability while preserving adaptation performance. For example, in workflow evolution, CPE improves retained simple-task performance from 41.8\% to 52.8\% under GPT-5.1 optimization while simultaneously achieving stronger complex-task adaptation. Our findings suggest that stable long-horizon self-evolving agents require not only acquiring new capabilities, but also explicitly preserving previously learned ones during continual adaptation.

2605.09314 2026-05-12 cs.AI

How LLMs Are Persuaded: A Few Attention Heads, Rerouted

Xiangkun Sun, Lingkai Kong, Aoqi Zhang, Liang Zeng, Tonghan Wang

AI总结 该研究探讨了大型语言模型如何被说服放弃事实知识的问题,揭示了其内部的因果机制。研究发现,模型的回答主要由少数中间层注意力头决定,这些注意力头将选项编码为低维多面体的顶点,说服过程实际上是一个从正确答案顶点到目标答案顶点的离散跳跃。通过干预实验,研究进一步确认了说服机制依赖于一个可操控的注意力路由特征,并追踪到输入中的说服关键词所构建的浅层注意力头,为监控和防御此类漏洞提供了新思路。

Comments 9 pages, 9 figures

详情
英文摘要

Language models can be persuaded to abandon factual knowledge. This vulnerability is central to AI safety, but its internal mechanism remains poorly understood. We uncover a compact causal mechanism for persuasion-induced factual errors. A small set of mid-layer attention heads almost entirely determines the model's answer. These heads write answer options into a low-dimensional polyhedron, with options occupying distinct vertices. Persuasion does not blur belief or merely reduce confidence; it causes a discrete latent jump from the correct-answer vertex to the persuasion-target vertex. We show that decision heads are not reasoning over evidence. Instead, they copy whichever option token their attention selects. Persuasion works by redirecting attention. We isolate a rank-one evidence-routing feature that controls the route. Directly modifying this feature steers the model's choice, and removing it blocks persuasion. We then trace the feature back to a band of shallower attention heads that build it from persuasive keywords in the input. Every step is validated by intervention. This mechanism appears across open-source LLMs and realistic poisoning scenarios such as Generative Engine Optimization, revealing persuasion as a narrow, monitorable circuit.

2605.09312 2026-05-12 cs.CV

Low-Cost Neural Radiance Fields

Alice Huang, Prathamesh Sonawane, Yashdeep Thorat, Yug Rao

AI总结 本文研究了如何在计算资源和数据量受限的情况下加速神经辐射场(NeRF)的训练与推理。作者对比了三种加速版NeRF模型,并针对低算力、低数据场景进行了扩展实验,包括引入深度监督损失、简化特征解码网络以及设计不同架构的HashNeRF。实验结果表明,在同等训练时间下,各改进方法未明显优于现有基线,但揭示了哪些改进更适合受限环境,并为未来研究提供了方向。

Comments 7 pages

详情
英文摘要

Neural Radiance Fields (NeRF) achieve high-quality novel-view synthesis, but their long training times and reliance on dense input views limit accessibility. We present a comparative study of three accelerated NeRF variants - DS-NeRF, TensoRF, and HashNeRF and explore extensions targeted at the low-compute, low-data regime. First, we add a depth-supervision loss derived from COLMAP keypoints to TensoRF (TensoRF-DS) and evaluate it on the LLFF dataset under reduced view counts. Second, we ablate the feature-decoding MLP of TensoRF and study the effect of input downsampling on PSNR and runtime on the synthetic Lego scene. Third, we propose four architectural variants of the HashNeRF color and density networks, including residual and convolutional designs, and report PSNR/training-time tradeoffs under matched iteration budgets. Under iso-time evaluation, none of our extensions conclusively outperform the published baselines, but the experiments characterize which extensions transfer to constrained settings and surface design questions for future work.

2605.09311 2026-05-12 cs.LG cs.AI physics.atom-ph physics.chem-ph physics.comp-ph

Teaching Molecular Dynamics to a Non-Autoregressive Ionic Transport Predictor

Jiyeon Kim, Byungju Lee, Won-Yong Shin

AI总结 本文研究了如何快速准确地预测离子传输性质这一动态材料属性的问题,提出了一种基于辅助模态学习的非自回归学习框架,通过在训练过程中引入原子轨迹作为辅助信息,使模型在推理阶段无需依赖轨迹数据即可捕捉动态特性。该方法克服了现有自回归模型计算慢、误差累积以及非自回归模型动态信息利用不足的缺陷,在包含轨迹数据的测试集上实现了比自回归模型快200倍的加速,并显著降低了预测误差。

Comments International Conference on Machine Learning (ICML 2026) (to appear) (Please cite our conference version.)

详情
英文摘要

Unlike most static material properties widely studied in the machine learning literature, ionic transport properties are inherently dynamic, making their fast and accurate prediction from static atomic structures challenging. The current standard approach, molecular dynamics (MD) simulations, suffers from prohibitively high computational cost. Recent autoregressive learning-based MD acceleration methods requiring sequential inference remain slow and prone to error accumulation; in contrast, existing non-autoregressive material property prediction models are less accurate because they fail to exploit dynamics. Moreover, existing methods typically benefit from datasets either with or without atomic trajectories, but not both. To overcome these limitations, we propose a non-autoregressive learning framework based on auxiliary modality learning, which treats atomic trajectories as an auxiliary modality during training but does not require them at inference. This enables the predictor to learn dynamics without sequential inference while benefiting from both types of datasets. As a result, our framework achieves over 200 times speedup compared to autoregressive models on the dataset with atomic trajectories while substantially reducing prediction error relative to non-autoregressive benchmarks across both types of datasets. Our code is available at https://github.com/jykim-git/MD.

2605.09310 2026-05-12 cs.AI q-fin.PM

Beyond ESG Scores: Learning Dynamic Constraints for Sequential Portfolio Optimization

Xin Li, Yan Ke, Longbing Cao

AI总结 本文研究了在可持续投资中如何更有效地将环境、社会和治理(ESG)因素纳入投资组合优化过程。不同于传统方法将ESG视为静态评分,作者提出了一种动态约束学习方法,通过多模态行动条件约束场(MACF)从实时多源数据中学习特定机制的ESG成本,并引入MACF-X适配器将这些约束转化为优化器可识别的接口。该方法在保持良好财务表现的同时,有效降低了ESG预算压力,实验表明其优势依赖于动态证据输入和三头分解结构。

详情
英文摘要

ESG-aware portfolio optimization is increasingly important for sustainable capital allocation, yet most learning-based methods still operationalize ESG by appending static scores to the policy observation or reward. This creates a mismatch for sequential control: ESG scores are noisy, provider-dependent, low-frequency, and temporally misaligned with sequential portfolio decisions, while financial evidence suggests that ESG is better treated as a portfolio preference, risk-exposure, or hedge dimension than as a robust alpha factor. We propose to impose ESG constraints without modifying the financial policy's observation or reward, using a Multimodal Action-Conditioned Constraint Field (MACF) that learns mechanism-specific ESG costs from point-in-time multimodal evidence and contemplated portfolio transitions. We then introduce MACF-X, a family of optimizer-specific adapters that converts MACF costs and uncertainties into native constrained-optimization interfaces through a shared slack- and uncertainty-aware pressure layer. Across multiple constraint-integration interfaces, MACF-X reduces tail ESG budget pressure while maintaining competitive financial performance. Ablations show that this improvement depends on dynamic evidence inputs and three-head decomposition, while static ESG-score proxies are nearly indistinguishable from score-shuffled noise baselines.

2605.09308 2026-05-12 cs.LG cs.AI

Hierarchical Attention-based Graph Neural Network with Relevance-driven Pruning

Seungwoo Kum

AI总结 本文提出了一种基于分层注意力机制的异构图神经网络(HA-HeteroGNN),旨在解决图神经网络在处理异构节点类型时解释性不足以及大规模噪声图中计算开销大的问题。该方法通过统一的可解释性到剪枝的流程,利用双层注意力机制区分传感器级和上下文级的计算,生成节点相关性评分,并以此作为剪枝依据,有效减少了图边数量同时提升了分类准确率。实验表明,该方法在保持高分类性能的同时显著降低了训练时间和推理延迟,验证了其在实际应用中的有效性。

详情
英文摘要

Graph Neural Networks (GNNs) excel at relational reasoning but face two persistent challenges: the lack of interpretable attribution for heterogeneous node types, and the computational overhead of message passing over large, noisy graphs. We propose the Hierarchical Attention-based Heterogeneous GNN (HA-HeteroGNN), a framework that addresses both issues through a unied explainability-to-pruning pipeline. A two-tier attention mechanism separates sensor-level and context-level computation across 16 node types and 18 edge types, producing per-node relevance scores via an attention-based GNN Explainer without requiring gradient backpropagation. These relevance scores then serve as a principled pruning criterion: removing nodes identied as consistently uninformative yields a 27% reduction in graph edges while simultaneously improving classication accuracy by 2.46.1% across all model variants, challenging the conventional assumption that pruning necessarily trades accuracy for eciency. Experiments on a 50,000-record synthetic dataset spanning 11 report categories demonstrate 97.5% cross-strategy explanation stability and domain consistent sensor attribution, with training-time reductions of up to 43.9% and real-time inference latency of approximately 5860 ms per sample.

2605.09303 2026-05-12 cs.LG

Path-Dependent Denoising: A Non-Conservative Field Perspective on Order Collapse in Diffusion Language Models

Jeonseong Kim

AI总结 扩散语言模型(DLMs)提供了一种不同于自回归生成的结构化生成方式,允许在任意顺序或并行更新标记。然而,实际应用中其解码过程仍高度依赖于顺序,常表现出类似自回归的行为。本文从非保守场视角出发,提出路径依赖去噪的概念,揭示了局部去噪条件与全局顺序之间的兼容性问题,并构建了用于诊断DLM解码是否真正实现无序生成的推理阶段分析框架。

详情
英文摘要

Diffusion language models (DLMs) offer a structural alternative to autoregressive generation: denoising can update tokens in arbitrary orders or in parallel rather than along a fixed left-to-right chain. In practice, fast DLM decoding remains strongly order-sensitive and often drifts toward autoregressive-like trajectories. We trace this tension to compatibility. At each reverse-time step, a DLM provides local denoising conditionals over the unresolved tokens. Arbitrary-order denoising becomes well defined when these local conditionals compose into order-invariant pseudo-joints. We formalize this view by defining order-induced pseudo-joints and a local denoising circulation: the log-ratio between the two pseudo-joints obtained by swapping a pair of unresolved positions. This circulation is zero under compatible conditionals, and global order gaps decompose into sums of local circulations along adjacent swaps. We further separate incompatibility-driven path dependence from conditional-dependence error in parallel updates and from order-specific estimation error. The resulting framework provides inference-only diagnostics for testing when DLM decoding is genuinely order-free.

2605.09302 2026-05-12 cs.LG cs.CV

Discrete Langevin-Inspired Posterior Sampling

Chaitanya Amballa, Sattwik Basu, Jorge Vančo Sampedro, Romit Roy Choudhury

AI总结 本文研究了在离散状态空间中使用离散扩散模型作为生成先验的逆问题后验采样方法。现有方法多依赖于连续松弛、吉布斯更新或特定退化过程的机制,限制了其可扩展性和通用性。为此,作者提出了一种基于离散朗之万动力学的后验采样器ΔLPS,能够在不离开离散状态空间的前提下,利用梯度信息高效地进行采样,支持所有维度的并行更新,并适用于不同训练方式的离散扩散模型。实验表明,该方法在图像恢复和空间映射等任务中优于现有离散扩散后验采样器,并能与连续扩散方法竞争。

详情
英文摘要

We study posterior sampling for inverse problems in discrete state spaces using discrete diffusion models as generative priors. While continuous diffusion models have become widely used for inverse problems, their discrete counterparts remain comparatively underexplored. Existing discrete posterior samplers often rely on continuous relaxations of discrete variables, Gibbs-style updates, or mechanisms specialized to particular corruption processes, which can limit scalability or generality. We propose $Δ$LPS, a Discrete Langevin-Inspired Posterior Sampler that uses gradient information to identify promising discrete moves without leaving the discrete state space. The resulting approach enables efficient parallel updates across all token dimensions and is agnostic to the training paradigm of the discrete diffusion prior, including masked and uniform-state diffusion. We evaluate our method on image restoration tasks across MNIST, CIFAR, and FFHQ, as well as spatial mapping, covering linear, nonlinear, and blind inverse problems. Across these settings, we improve over recent discrete diffusion posterior samplers and are competitive with strong continuous diffusion-based inverse solvers. Our results suggest that fully discrete, gradient-informed posterior samplers offer a scalable and general path toward solving inverse problems over discrete representations.

2605.09301 2026-05-12 cs.LG cs.AI

Neural Cluster First, Route Second: One-Shot Capacitated Vehicle Routing via Differentiable Optimal Transport

Samuel J. K. Chin, Maximilian Schiffer

AI总结 本文提出了一种基于神经网络的“聚类优先、路径其次”(Neural CFRS)方法,用于解决带容量约束的车辆路径问题(CVRP)。该方法突破了传统自回归解码的限制,采用可微分最优传输层,端到端地处理全局车队容量约束,实现了高效的一次性解码。相比现有方法,Neural CFRS 在保持高参数效率的同时,展现出对大规模和分布外实例的鲁棒性,并在标准基准测试中取得了具有竞争力的优化结果。

Comments 30 pages, 9 figures

详情
英文摘要

The Capacitated Vehicle Routing Problem (CVRP) underpins modern last-mile logistics. Current Neural Combinatorial Optimization (NCO) methods construct CVRP solutions autoregressively, inheriting sequential decoding bottlenecks, sensitivity to spatial symmetries, and brittle out-of-distribution behavior. We revisit the classical Cluster-First-Route-Second (CFRS) paradigm -- long known to be asymptotically optimal but largely overlooked by NCO -- and argue that it is structurally aligned with the core strengths of deep learning: similarity and assignment over global context, rather than the construction of long sequential tours. We introduce Neural CFRS, the first purely non-autoregressive one-shot neural CFRS framework for the CVRP. It enforces global fleet-capacity constraints end-to-end via a differentiable entropic Optimal Transport layer, producing a continuous transport plan to sparsify an exact capacitated assignment solver. We provide formal theoretical guarantees that our architecture intrinsically abstracts away $E(2)$ spatial, inter-route permutation, and intra-route traversal symmetries. By equipping the framework with a pre-trained spatial vocabulary, we unlock extreme parameter efficiency and zero-shot scaling. Designed primarily for real-world spatial distributions under a constant capacity setting, Neural CFRS scales robustly to out-of-distribution $N=1000$ instances with a < 4% gap -- retaining an approximate 5% gap at this scale even as an ultra-lightweight, single-layer architecture. Furthermore, when deployed out-of-the-box on standard benchmarks, we achieve a highly competitive 2.73% optimality gap on size-100 problems.

2605.09296 2026-05-12 cs.CV cs.AI cs.LG

Micro-Defects Expose Macro-Fakes: Detecting AI-Generated Images via Local Distributional Shifts

Boxuan Zhang, Jianing Zhu, Qifan Wang, Jiang Liu, Ruixiang Tang

AI总结 近年来生成模型能够生成高度逼真的图像,使得区分真实图像与AI生成图像变得愈发困难。现有基于预训练特征提取器的检测方法往往过于依赖全局语义信息,忽略了关键的微小缺陷。本文提出了一种基于局部分布差异的检测框架MDMF,通过放大图像中微小的统计不规则性,揭示AI生成图像的宏观分布差异,显著提升了检测性能。实验表明,MDMF在多个基准测试中均优于现有方法,验证了其有效性。

Comments 41 pages, 10 figures

详情
英文摘要

Recent generative models can produce images that appear highly realistic, raising challenges in distinguishing real and AI-generated images. Yet existing detectors based on pre-trained feature extractors tend to over-rely on global semantics, limiting sensitivity to the critical micro-defects. In this work, we propose Micro-Defects expose Macro-Fakes (MDMF), a local distribution-aware detection framework that amplifies micro-scale statistical irregularities into macro-level distributional discrepancies. To avoid localized forensic cues being diluted by plain aggregation, we introduce a learnable Patch Forensic Signature that projects semantic patch embeddings into a compact forensic latent space. We then use Maximum Mean Discrepancy (MMD) to quantify distributional discrepancies between generated and real images. Our theory-grounded analysis shows that patch-wise modeling yields provably larger discrepancies when localized forensic signals are present in generated images, enabling more reliable separation from real images. Extensive experiments demonstrate that MDMF consistently outperforms baseline detectors across multiple benchmarks, validating its general effectiveness. Project page: https://zbox1005.github.io/MDMF-project/

2605.09295 2026-05-12 cs.CL

LEAF-SQL: Level-wise Exploration with Adaptive Fine-graining for Text-to-SQL Skeleton Prediction

Zhao Tan, Xiping Liu, Qing Shu, Qizhi Wan, Dexi Liu, Changxuan Wan

AI总结 LEAF-SQL 是一种用于文本到 SQL 骨架预测的新框架,旨在解决复杂查询生成中的结构探索难题。该方法将骨架预测重构为从粗粒度到细粒度的树搜索过程,通过三级骨架层次结构、骨架生成代理和评估代理的协同工作,实现结构多样化与粒度自适应的搜索。实验表明,LEAF-SQL 显著提升了多种大语言模型在复杂查询任务中的表现,尤其在 BIRD 基准测试中取得了优于现有方法的执行准确率。

详情
英文摘要

Text-to-SQL translates natural language questions into executable SQL queries, enabling intuitive database access for non-experts. While large language models achieve strong performance on Text-to-SQL with prompting, they still struggle with complex queries that involve deeply nested logic or multiple clauses. A widely used approach employs SQL skeletons--intermediate representations of query logic--to streamline generation, but existing methods are limited by their reliance on a single structural hypothesis and lack of progressive reasoning. To overcome these limitations, we propose LEAF-SQL, a novel framework that reframes skeleton prediction as a coarse-to-fine tree search process. LEAF-SQL enables systematic exploration of diverse structural hypotheses with adaptive refinement. Several key techniques are employed in LEAF-SQL: (1) a three-level skeleton hierarchy to guide the search, (2) a Skeleton Formulation Agent to generate diverse candidates, and (3) a Skeleton Evaluation Agent to efficiently prune the search space. This integrated design yields skeleton candidates that are both structurally diverse and granularity-adaptive, providing a stronger foundation for the SQL generation. Extensive experiments show that LEAF-SQL consistently improves the performance of various LLM backbones. On the official hidden test set of the challenging BIRD benchmark, our method achieves 71.6 execution accuracy, which outperforms leading search-based and skeleton-based methods, affirming its effectiveness for complex queries.

2605.09294 2026-05-12 cs.LG cs.AI

Towards Effective Theory of LLMs: A Representation Learning Approach

Muhammed Ustaomeroglu, Guannan Qu

AI总结 本文提出了一种名为“表示有效理论”(RET)的框架,用于从大语言模型的隐藏状态轨迹中学习宏观状态,从而以高层次结构描述其计算过程。该方法采用类似BYOL/JEPA的自监督目标,将激活值粗粒化为保留预测与解释相关信息的宏观变量。实验表明,这些宏观变量能够揭示模型推理过程中的“心智状态”轨迹,捕捉高层语义结构,并支持对行为结果的早期预测与可控干预,为理解与引导大语言模型提供了有效的描述方式。

Comments Project webpage: https://ustaomeroglu.github.io/RET/

详情
英文摘要

We propose Representational Effective Theory (RET), a framework for describing large language model computation in terms of learned macrostates rather than microscopic details. RET learns these macrostates from hidden-state trajectories using a BYOL/JEPA-style self-supervised objective, coarse-graining activations into macrovariables that preserve higher-level structure relevant for prediction and interpretation. We evaluate whether these macrovariables are practically relevant for interpretability: RET yields temporally consistent states that reveal "mental-state" trajectories of reasoning, capture high-level semantic structure, support early prediction of behavioral outcomes such as sycophancy, and provide causal handles for steering generations toward interpretable computational phases. Together, these results suggest that LLM computation admits useful effective descriptions via RET: high-level, dynamically meaningful variables that support interpretation, prediction, and intervention.

2605.09292 2026-05-12 cs.AI cs.CY

Beyond Accuracy: Evaluating Strategy Diversity in LLM Mathematical Reasoning

Xia Yang, Xuanyi Zhang, Hao Hu, Feng Ji

AI总结 该研究探讨了大语言模型在数学推理任务中除答案准确率之外的策略多样性问题。研究提出了一种基于策略层面的评估框架,利用80道AMC 10/12和AIME题目以及217种AoPS参考策略,分析模型生成策略的多样性与有效性。实验发现,尽管模型在单一解法提示下具有高准确率,但在多策略提示下其策略覆盖范围远低于人类参考水平,且不同模型在几何和数论等领域的策略生成能力存在显著差异。研究还表明,模型虽能生成部分新颖策略,但整体上仍无法全面覆盖人类策略,揭示了当前模型在数学推理灵活性方面的局限性。

详情
英文摘要

Large language models now achieve high final-answer accuracy on mathematical reasoning benchmarks, but accuracy alone does not capture reasoning flexibility. We introduce a strategy-level evaluation framework instantiated on 80 AMC 10/12 and AIME problems with 217 AoPS-derived reference strategy families. Model outputs are annotated for strategy identity, validity, and correctness using dual-AI coding with human adjudication. Across four frontier models, we find a pronounced decoupling between answer accuracy and strategy diversity. Under a single-solution prompt, all models achieve high accuracy (95%-100%), but under a multiple-strategy prompt they recover substantially fewer strategies than the human reference set. Gemini, DeepSeek, GPT, and Claude generate 184, 152, 151, and 110 distinct valid strategies, respectively, with the largest gaps in Geometry and Number Theory. The models collectively produce 50 benchmark-novel valid strategies, indicating both incomplete coverage of human strategies and some capacity for alternative reasoning. A repeated-run robustness check on 20 problems shows diminishing gains in discovered strategies, with the strongest model recovering only 39 of 55 AoPS-reference strategies (71%) after three runs. These findings position strategy diversity as a complementary dimension for evaluating mathematical reasoning beyond answer correctness.

2605.09291 2026-05-12 cs.LG stat.AP

dFlowGRPO: Rate-Aware Policy Optimization for Discrete Flow Models

Zhengyan Wan, Yidong Ouyang, Panwen Hu, Qiang Sun

AI总结 本文提出了一种名为dFlowGRPO的强化学习框架,用于离散流模型,支持更广泛的概率路径和非掩码源分布。该方法通过推导离散流模型的完整轨迹概率,将去噪过程建模为马尔可夫决策过程,从而在强化学习中结合条件转移率和后验模型的信息。实验表明,dFlowGRPO在文本到图像生成任务中优于现有的GRPO方法,并在理解任务中展现出强大的能力。

详情
英文摘要

Discrete flow models (DFMs) are a class of flexible generative models for generating discrete data, and diffusion large language models (dLLMs) can be viewed as a special case with a specific choice of mixture path and a masked source distribution. While several recent works have explored reinforcement learning into dLLMs, its application to more general discrete flow models remains underexplored. In this work, we present discrete Flow-GRPO (dFlowGRPO), a unified reinforcement learning framework for discrete flow models that supports a broad family of probability paths and non-masked source distributions. We derive the full trajectory probability for DFMs and formulate denoising as a Markov decision process, enabling dFlowGRPO to incorporate information from both the associated conditional transition rates and the posterior model during reinforcement learning. We apply dFlowGRPO to FUDOKI, a recent multimodal discrete flow model, and evaluate it on both image generation and multimodal understanding tasks. Empirical results show that dFlowGRPO outperforms existing GRPO-type methods for dLLMs on text-to-image generation tasks and achieves performance competitive with continuous flow-based models trained using FlowGRPO, while also demonstrating strong capabilities on understanding tasks.