arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2085
2605.06727 2026-05-11 cs.LG cs.ET eess.IV

Medical Imaging Classification with Cold-Atom Reservoir Computing using Auto-Encoders and Surrogate-Driven Training

Nuno Batista, Ana Morgado, Oscar Ferraz, Sagar Silva Pratapsi, Jorge Lobo, Gabriel Falcao

AI总结 本文提出了一种基于中性原子量子储库计算的混合量子-经典框架,用于医学图像分类,特别针对息肉检测的二分类任务。为应对高维图像数据,研究引入了引导式自编码器以学习紧凑且具有判别性的图像表示,并通过可微分的替代模型克服量子测量的非微分特性,实现端到端训练。实验表明,该方法在分类准确率和图像重建质量方面优于传统方法,展示了其在当前NISQ时代医学影像应用中的鲁棒性和灵活性。

Comments 8 pages, 6 figures. Accepted to the 2025 IEEE International Conference on Quantum AI (IEEE QAI). Supported by FCT and the Open Quantum Institute (OQI)

详情
Journal ref
2025 IEEE International Conference on Quantum Artificial Intelligence (QAI)
英文摘要

We introduce a hybrid quantum-classical pipeline, based on neutral-atom reservoir computing, for medical image classification, focusing on the binary classification task of polyp detection. To deal effectively with the high dimensionality, we integrate a guided auto-encoder. This pipeline learns compact and discriminative representations of image data that are also well-suited for quantum reservoir computing. A key challenge in such systems is the non-differentiable nature of quantum measurements, which creates a 'gradient barrier' for standard training. We overcome this barrier by incorporating a differentiable surrogate model that emulates the quantum layer, enabling end-to-end backpropagation through the entire system. This guided training process is jointly optimized for classification accuracy and for faithful image recovery from the auto-encoder. The learned latent representations are encoded as pulse detuning parameters within a Rydberg Hamiltonian, and quantum embeddings are subsequently obtained through expectation values. These embeddings are then passed to a linear classifier. Our simulations show that this method outperforms some traditional approaches that use PCA or unguided autoencoders. We also conduct ablation studies to assess the impact of various quantum and training parameters, demonstrating the robustness and flexibility of our proposed pipeline for real-world medical imaging applications, even in the current NISQ era.

2605.06726 2026-05-11 cs.LG

Transformer-Based Wildlife Species Classification from Daily Movement Trajectories

Obed Irakoze, Prasenjit Mitra

AI总结 本文研究如何仅从野生动物每日移动轨迹数据中识别物种,提出了一种基于Transformer的序列模型进行分类。相比LSTM、CNN等传统模型,Transformer在多个物种分类任务中表现出更高的平衡准确率,尤其在大象二分类任务中取得了0.83的平衡准确率和0.92的AUC值。研究还发现,引入更丰富的运动特征描述可显著提升模型性能,特别是在数据稀缺的物种上,同时统一使用1小时时间分辨率有助于提升整体分类效果。

Comments 8 pages

详情
英文摘要

Inferring the identity of wildlife species from daily movement data alone is a challenging task. We train sequence models on large-scale, 7-species GPS trajectories from the Movebank platform. Trajectories models are evaluated using a protocol in which entire telemetry studies or regions are heldout during testing. We compare Transformer-based sequence models to LSTM, CNN, and Temporal Convolutional Networks, and find that Transformers consistently achieve higher balanced accuracy with gains of approximately 8 to 22 percentage points, depending on the species and experimental setting. In an elephant binary classification task with 1-hour resolution, the Transformer achieves a balanced accuracy of 0.83 and an AUC of 0.92, substantially outperforming all baseline models. We examine, under data-limited conditions, feature representations by analyzing the differences between a basic displacement-based encoding and an expanded range of movement descriptors that include speed, direction, and turning behavior. With feature augmentation, we see clear performance gains, especially for underrepresented and sparsely represented species, such as large carnivores, lions, and Zebras. Finally, experiments comparing 1-hour and 30-minutetemporal resolutions show that while finer sampling can capture short-term movement patterns for some species, a unified 1-hour resolution yields more promising performance across studies by reducing missing data and ensuring consistent temporal coverage.

2605.06724 2026-05-11 cs.LG cs.AI eess.SP

Enabling Unsupervised Training of Deep EEG Denoisers With Intelligent Partitioning

Qiyu Rao, Haozhe Tian, Homayoun Hamedmoghadam, Danilo Mandic

AI总结 本文研究了如何在无监督条件下训练深度脑电图(EEG)去噪模型,针对可穿戴EEG中神经活动微弱且与噪声频谱重叠的问题,提出了一种名为iPSD的智能分区自监督去噪方法。该方法无需干净的参考信号,通过学习将输入EEG片段分割为具有相同潜在信号的独立噪声实例,实现对去噪模型的自监督训练。实验表明,iPSD在极低信噪比和复杂噪声环境下表现优异,显著优于现有方法。

详情
英文摘要

Denoising wearable electroencephalogram (EEG) is inherently challenging since neural activity is not only subtle but also inseparable from spectrally overlapping noise artifacts. Classical signal processing methods, relying on fixed or heuristic rules, cannot handle the time-varying pervasive artifacts in wearable EEGs. Deep learning methods, on the other hand, show promise in decomposition-free EEG denoising using highly expressive neural networks, but the training requires artifact-free EEG, which is inherently unobtainable. To address this, we propose Intelligent Partitioning for Self-supervised Denoising (iPSD). Our method eliminates the need for clean references by learning to partition an input EEG segment into independent noisy realizations with the same underlying signal. This enables self-supervision of deep learning denoisers, even in zero-shot settings where only a single EEG segment to be denoised is available. We validate iPSD through extensive experiments, including validations on wearable EEG from in-ear sensors. The results show that iPSD achieves state-of-the-art performance, most notably under extremely low signal-to-noise ratios (down to -10 dB) and challenging artifacts (e.g., EMG), with spectral fidelity orders of magnitude higher than competitive baselines.

2605.06723 2026-05-11 cs.AI cs.CL cs.LG

When Does a Language Model Commit? A Finite-Answer Theory of Pre-Verbalization Commitment

Long Zhang, Wei-neng Chen, Feng-feng Wei, Zi-bo Qin

AI总结 本文研究语言模型在生成最终答案前何时形成稳定答案偏好,提出了一种基于有限答案的偏好稳定化理论。通过将模型的续写概率投影到有限答案集合上,定义了精确的对数奇点度量,并据此分析了答案起始时间、回顾稳定时间等关键指标。实验表明,该方法在无需贪心解码或学习探针的情况下,能够提前于答案可解析时检测到偏好稳定,并且该信号与模型最终输出高度相关,具备良好的可解释性和可迁移性。

详情
英文摘要

Language models often generate reasoning before giving a final answer, but the visible answer does not reveal when the model's answer preference became stable. We study this question through a narrow computable object: \emph{finite-answer preference stabilization}. For a model state and specified answer verbalizers, we project the model's own continuation probabilities onto a finite answer set; in binary tasks this yields an exact log-odds code, $δ(ξ)=S_θ(\mathrm{yes}\midξ)-S_θ(\mathrm{no}\midξ)$. This target defines parser-based answer onset, retrospective stabilization time, and lead without relying on greedy rollouts or learned probes. In controlled delayed-verdict tasks with Qwen3-4B-Instruct, the contextual finite-answer projection stabilizes before the answer is parseable, with 17--31 token mean lead in the main templates and positive, shorter lead in a parser-clean replication. The signal tracks the model's eventual output rather than truth, is linearly recoverable from compact hidden summaries, is partly separable from cursor progress, and transfers as shared information without a single invariant coordinate. Diagnostics separate the measurement from online stopping, verbalizer-free belief, and causal answer control; exact steering shows local sensitivity of $δ$ but not reliable generation control.

2605.06720 2026-05-11 cs.LG cs.AI

Conditional generation of antibody sequences with classifier-guided germline-absorbing discrete diffusion

Justin Sanders, Luca Giancardo, Lan Guo, Yue Zhao, Kemal Sonmez, Nina Cheng, Melih Yilmaz

AI总结 该研究针对抗体序列的条件生成问题,提出了一种基于分类器引导的离散扩散模型,旨在克服现有方法在生成生物学意义的体细胞变异和灵活条件生成方面的不足。核心方法引入了“种系吸收扩散”,将种系序列作为扩散过程的吸收状态,从而引导模型学习从种系到成熟抗体序列的演化路径,有效减少种系偏倚。实验表明,该方法在非种系残基预测和条件生成任务中均表现出色,显著优于现有方法。

Comments 9 pages, 2 figures, 2 tables

详情
英文摘要

Antibody therapeutics are among the most successful modern medicines, yet computationally designing antibodies with desirable binding and developability properties remains challenging. While protein language models (pLMs) have emerged as powerful tools for antibody sequence design, existing approaches largely suffer from two key limitations: they predominantly memorize germline sequences rather than modeling biologically meaningful somatic variation, and they offer limited support for flexible classifier-guided conditional generation. We address these challenges through two primary contributions. First, we demonstrate that discrete diffusion fine-tuning achieves strong language modeling performance on antibody sequences while allowing for generation conditioned on any off-the-shelf classifier. Second, we introduce germline absorbing diffusion, a novel modification of the discrete diffusion noise process in which the germline sequence - rather than a masked sequence - serves as the absorbing state. This biologically motivated inductive bias restricts the model to learning the trajectory from germline to observed sequence, effectively excluding genetic variation and V(D)J recombination statistics from the learned distribution and dramatically mitigating germline bias. We show that germline diffusion improves non-germline residue prediction accuracy from 26 percent to 46 percent, approaching the theoretical upper bound set by true biological variability. We then demonstrate the utility of our germline diffusion model on the conditional generation tasks of sampling antibodies with improved hydrophobicity and predicted binding affinity. On both tasks our model shows an improved tradeoff between class adherence and sample quality, significantly outperforming EvoProtGrad, a popular strategy to sample from pLMs with gradient-based discrete Markov Chain Monte Carlo.

2605.06716 2026-05-11 cs.AI cs.CL

From Storage to Experience: A Survey on the Evolution of LLM Agent Memory Mechanisms

Jinghao Luo, Yuchen Tian, Chuxue Cao, Ziyang Luo, Hongzhan Lin, Kaixin Li, Chuyi Kong, Ruichao Yang, Jing Ma

AI总结 本文综述了基于大语言模型(LLM)的智能体记忆机制的演化过程,提出了一种新的进化框架,将发展过程划分为存储、反思和经验三个阶段,明确了推动这一演化的三个核心驱动力。文章还重点探讨了经验阶段的两种关键机制——主动探索与跨轨迹抽象,为下一代LLM智能体的设计提供了坚实的理论基础和清晰的发展路径。

Comments Accepted by ACL 2026 Findings

详情
英文摘要

Large Language Model (LLM)-based agents have fundamentally reshaped artificial intelligence by integrating external tools and planning capabilities. While memory mechanisms have emerged as the architectural cornerstone of these systems, current research remains fragmented, oscillating between operating system engineering and cognitive science. This theoretical divide prevents a unified view of technological synthesis and a coherent evolutionary perspective. To bridge this gap, this survey proposes a novel evolutionary framework for LLM agent memory mechanisms, formalizing the development process into three stages: Storage (trajectory preservation), Reflection (trajectory refinement), and Experience (trajectory abstraction). We first formally define these three stages before analyzing the three core drivers of this evolution: the necessity for long-range consistency, the challenges in dynamic environments, and the ultimate goal of continual learning. Furthermore, we specifically explore two transformative mechanisms in the frontier Experience stage: proactive exploration and cross-trajectory abstraction. By synthesizing these disparate views, this work offers robust design principles and a clear roadmap for the development of next-generation LLM agents.

2605.06714 2026-05-11 cs.CV cs.AI

Edge Deep Learning in Computer Vision and Medical Diagnostics: A Comprehensive Survey

Yiwen Xu, Tariq M. Khan, Yang Song, Erik Meijering

AI总结 本文综述了边缘深度学习在计算机视觉与医学诊断领域的最新进展,重点探讨了其基础原理、技术优势及实际应用。文章提出了基于性能和使用场景的边缘硬件平台分类方法,并总结了在边缘设备上高效部署深度神经网络的关键技术,如轻量化设计与模型压缩。通过分析实际应用案例,展示了边缘深度学习在现实场景中的深远影响,并指出了未来研究方向与面临的挑战,为研究人员和实践者提供了全面的参考。

详情
Journal ref
Artificial Intelligence Review, Volume 58, Article number 93 (2025)
英文摘要

Edge deep learning, a paradigm change reconciling edge computing and deep learning, facilitates real-time decision making attuned to environmental factors through the close integration of computational resources and data sources. Here we provide a comprehensive review of the current state of the art in edge deep learning, focusing on computer vision applications, in particular medical diagnostics. An overview of the foundational principles and technical advantages of edge deep learning is presented, emphasising the capacity of this technology to revolutionise a wide range of domains. Furthermore, we present a novel categorisation of edge hardware platforms based on performance and usage scenarios, facilitating platform selection and operational effectiveness. Following this, we dive into approaches to effectively implement deep neural networks on edge devices, encompassing methods such as lightweight design and model compression. Reviewing practical applications in the fields of computer vision in general and medical diagnostics in particular, we demonstrate the profound impact edge-deployed deep learning models can have in real-life situations. Finally, we provide an analysis of potential future directions and obstacles to the adoption of edge deep learning, with the intention to stimulate further investigations and advancements of intelligent edge deep learning solutions. This survey provides researchers and practitioners with a comprehensive reference shedding light on the critical role deep learning plays in the advancement of edge computing applications.

2605.06708 2026-05-11 cs.CV cs.AI

Visual Text Compression as Measure Transport

Lv Tang, Tianyi Zheng, Yang Liu, Bo Li, Xingyu Li

AI总结 该研究探讨了视觉文本压缩(VTC)在处理长文本时的效率与效果问题,提出了一种基于测度传输的理论框架,用于量化视觉编码引起的任务相关信息损失。研究通过将文本和视觉标记视为经验概率测度,分析了ViT编码器的推前映射及其分解为精度和覆盖成本的传输代价,并基于此提出了无需下游标签的路由准则和聚焦机制,有效提升了模型在多个自然语言处理任务中的表现。

详情
英文摘要

Visual text compression (VTC) promises efficient long-context processing by rendering text into an image and re-encoding it with a vision-language model, often producing $3$--$20\times$ fewer decoder tokens than subword tokenization. Yet token savings do not translate predictably into downstream utility: on some tasks the visual path matches or exceeds the text path, on others it collapses, and the compression ratio itself does not predict which regime will occur. The missing quantity is therefore not another summary of efficiency, but a principled measure of task-relevant information loss induced by visual encoding. We address this problem by formulating VTC in the language of measure transport. Treating text and visual tokens as empirical probability measures, we show that the ViT patch encoder induces a push-forward map whose transport cost decomposes into a precision cost from within-patch aggregation and a coverage cost from cross-patch fragmentation. Both terms are estimable from downstream-label-free probes. This formulation yields two operational consequences: a downstream-label-free routing criterion that selects whether to use the visual path for a given input or benchmark instance, and a transport-informed foveation mechanism that re-encodes high-cost regions at higher resolution. Across $24$ NLP datasets at Qwen3-4B, our label-free rule matches the per-dataset oracle on $17/24$ datasets ($70.8\%$), and improves the average task score by $+3.3\%$ with $-10.3\%$ average tokens relative to a pure-LLM.

2605.06702 2026-05-11 cs.AI cs.CL cs.LG

CASCADE: Case-Based Continual Adaptation for Large Language Models During Deployment

Siyuan Guo, Yali Du, Hechang Chen, Yi Chang, Jun Wang

AI总结 本文提出CASCADE,一种面向大语言模型在部署阶段持续适应的框架,旨在解决传统训练与部署阶段分离导致的学习停滞问题。CASCADE通过构建一个动态的案例记忆库,使模型能够在不修改参数的情况下,从部署过程中的经验中持续学习和优化。该方法将经验复用建模为上下文老虎机问题,实现了探索与利用的平衡,并在多个任务中显著提升了模型性能。

详情
英文摘要

Large language models (LLMs) have become a central foundation of modern artificial intelligence, yet their lifecycle remains constrained by a rigid separation between training and deployment, after which learning effectively ceases. This limitation contrasts with natural intelligence, which continually adapts through interaction with its environment. In this paper, we formalise deployment-time learning (DTL) as the third stage in the LLM lifecycle that enables LLM agents to improve from experience during deployment without modifying model parameters. We present CASCADE (CASe-based Continual Adaptation during DEployment), a general and principled framework that equips LLM agents with an explicit, evolving episodic memory. CASCADE formulates experience reuse as a contextual bandit problem, enabling principled exploration-exploitation trade-offs and establishing no-regret guarantees over long-term interactions. This design allows agents to accumulate, select, and refine task-relevant cases, transforming past experience into actionable knowledge. Across 16 diverse tasks spanning medical diagnosis, legal analysis, code generation, web search, tool use, and embodied interaction, CASCADE improves macro-averaged success rate by 20.9% over zero-shot prompting while consistently outperforming gradient-based and memory-based baselines. By reframing deployment as an adaptive learning process, this work establishes a foundation for continually improving AI systems.

2605.06696 2026-05-11 cs.AI cs.LG cs.MA

Hidden Coalitions in Multi-Agent AI: A Spectral Diagnostic from Internal Representations

Cameron Berg, Susan L. Schneider, Mark M. Bailey

AI总结 该研究探讨了多智能体系统中隐藏的联盟结构问题,提出了一种基于内部神经表示的谱分析方法,用于检测智能体之间潜在的信息耦合与联盟关系。该方法通过构建智能体隐藏状态之间的互信息图,并利用谱聚类技术识别出最显著的联盟边界,有效区分了真实的信息耦合与行为协调带来的虚假相似性。实验表明,该方法在强化学习和大型语言模型中均能准确揭示隐藏的联盟结构,为监测分布式AI系统中的涌现组织提供了有力工具。

Comments 18 pages

详情
英文摘要

Collections of interacting AI agents can form coalitions, creating emergent group-level organization that is critical for AI safety and alignment. However, observing agent behavior alone is often insufficient to distinguish genuine informational coupling from spurious similarity, as consequential coalitions may form at the level of internal representations before any overt behavioral change is apparent. Here, we introduce a practical method for detecting coalition structure from the internal neural representations of multi-agent systems. The approach constructs a pairwise mutual-information graph from the hidden states of agents and applies spectral partitioning to identify the most salient coalition boundary. We validate this method in two domains. First, in multi-agent reinforcement learning environments, the method successfully recovers programmed hierarchical and dynamic coalition structures and correctly rejects false positives arising from behavioral coordination without informational coupling. Second, using a large language model, the method identifies coalition structures implied by descriptive prompts, tracks dynamic team reassignments, and reveals a representational hierarchy where explicit labels dominate over conflicting interaction patterns. Across both settings, the recovered partition reveals subgroup organization that a scalar cross-agent mutual-information measure cannot distinguish. The results demonstrate that analyzing hidden-state mutual information through spectral partitioning provides a scalable diagnostic for identifying representational coalitions, offering a valuable tool for monitoring emergent structure in distributed AI systems.

2605.06690 2026-05-11 cs.AI cs.CL cs.LG

State Representation and Termination for Recursive Reasoning Systems

Debashis Guha, Amritendu Mukherjee, Sanjay Kukreja, Tarun Kumar

AI总结 本文研究递归推理系统中的状态表示与终止条件问题,提出了一个基于知识状态图的表示方法,用于编码推理过程中的主张、证据关系、开放问题及置信度权重。通过定义“顺序差距”作为迭代顺序对推理结果影响的度量,文章给出了判断何时应停止迭代的条件,并证明了该条件在固定点附近的非退化性,为递归推理系统的终止提供了理论依据。该方法可应用于智能体循环、树状思维、定理证明和持续学习等多个领域。

详情
英文摘要

Recursive reasoning systems alternate between acquiring new evidence and refining an accumulated understanding. Two design choices are typically left implicit: how to represent the evolving reasoning state, and when to stop iterating. This paper addresses both. We represent the reasoning state as an epistemic state graph encoding extracted claims, evidential relations, open questions, and confidence weights. We define the order-gap as the distance between the states reached by expand-then-consolidate versus consolidate-then-expand; a small order-gap suggests that the two orderings agree and further iteration is unlikely to help. Our main result gives a necessary and sufficient condition for the linearised order-gap to be non-degenerate near the fixed point, showing when the criterion is informative rather than algebraically vacuous. This is a local condition, not a global convergence guarantee. We apply the framework to recursive reasoning systems and sketch its application to agent loops, tree-of-thought reasoning, theorem proving, and continual learning.

2605.06686 2026-05-11 cs.LG econ.EM stat.AP stat.ML

Robustness of Refugee-Matching Gains to Off-Policy Evaluation Choices

Kirk Bansak, Elisabeth Paulson, Dominik Rothenhäusler, Jeremy Ferwerda, Jens Hainmueller, Michael Hotard

AI总结 本文研究了在美国难民匹配政策中,反事实影响评估结果对离线策略评估方法的稳健性。通过应用逆概率加权(IPW)和增强型逆概率加权(AIPW)等多种评估方法,并结合不同的模型结构和分配程序,研究发现无论采用何种方法,影响估计结果在数量级上均保持一致,且在多数情况下具有统计显著性。这些结果与Bansak等人(2018)最初的研究结论也高度一致。

Comments 13 pages, 2 figures, 10 tables

详情
英文摘要

Previous research has investigated the potential of refugee matching for boosting refugee outcomes, first considered by Bansak et al. (2018). This paper demonstrates the stability of counterfactual impact evaluation results in the context of refugee matching in the United States using a range of off-policy evaluation methods. In order to estimate counterfactual impact and test the robustness of our results, we employ several evaluation methods, including inverse probability weighting (IPW) and multiple variants of augmented inverse probability weighting (AIPW). We also consider various modifications, including alternative modeling architectures and different assignment procedures. The impact estimates remain consistent in magnitude in all scenarios as well as statistically significant in most cases. Furthermore, the estimates are also consistent with the results originally presented in Bansak et al. (2018).

2605.06685 2026-05-11 cs.SD eess.AS stat.AP

An audio-to-analysis pipeline with certified transcription for information-theoretic profiling of the piano repertoire

Fred Jalbert-Desforges

AI总结 本文提出了一种从音频直接生成作曲家层面信息论特征的分析流程,通过认证的乐谱转录层(在MAESTRO数据集上F1值达0.9791)提取和声音阶分布,并利用香农熵、非对称KL散度和齐普夫模型进行分析。研究揭示了作曲家在和声可预测性上的可解释排序,重现了已知的风格传承关系,并区分出现代极简主义作曲家与历史作曲家在和声过渡分布上的显著差异。

Comments 25 pages, 4 figures, 25 references

详情
英文摘要

We present an audio-to-analysis pipeline that produces composer-level information-theoretic profiles : reflecting compositional vocabulary as it emerges from aggregated performances : from raw recordings, built on a transcription layer whose accuracy we certify on a standard benchmark (F1 = 0.9791 on the MAESTRO v3.0.0 test set). Applied to 1,238 pieces and 15 MAESTRO composers with at least ten attributed pieces, spanning the Baroque through the early twentieth century, the pipeline derives empirical distributions over harmonic scale degrees and analyzes them through Shannon entropy, asymmetric Kullback-Leibler divergence, and Zipfian rank-frequency modeling. The resulting profiles (i) order composers along an interpretable axis of harmonic predictability, with a narrow entropy range (3.33-3.86 bits) that reveals the marginal-level similarity of tonal vocabularies; (ii) recover known stylistic lineages (Haydn-Beethoven, Liszt-Rachmaninoff, Schubert-Schumann) through the smallest KL divergences in the corpus, with Mendelssohn emerging as a stable outlier within this corpus; and (iii) separate contemporary neoclassical artists (Richter, Frahm, Glass, Arnalds, Jóhannsson) from historical composers on the quality of Zipfian fit to the transition distribution, with mean $R^2 = 0.78$ for neoclassical versus 0.46 for historical (N $\geq$ 10 pieces each). This gap is larger than the spread within either group and is consistent with a minimalist compositional tendency: a compact transition vocabulary used with sharper frequency-rank regularity than historical composers. All estimates are reported with Laplace-smoothed bootstrap 95% confidence intervals.

2605.06684 2026-05-11 cs.LG

From Canopy to Collision: A Hybrid Predictive Framework for Identifying Risk Factors in Tree-Involved Traffic Crashes

Abdul Azim, Ahmed Hossain, Soumyadip Maitra, Panick Kalambay

AI总结 该研究针对涉及树木的交通事故,提出了一种混合预测框架,用于识别和量化影响事故严重程度的风险因素。研究基于2020至2023年的交通事故报告数据库,结合分类模型、SHAP解释工具和逻辑回归模型,分析了如安全带使用、车辆年龄、超速和驾驶员状态等因素的影响,并揭示了这些因素之间的关键交互作用。研究发现,未使用安全带是导致严重后果的最主要因素,同时车辆老化、超速和驾驶者状态等因素也显著影响事故严重性,为制定针对性的安全干预措施提供了重要依据。

Comments 30 pages, 10 figures

详情
英文摘要

Tree-involved crashes represent a critical subset of run-off-road (ROR) collisions, often resulting in fatal or severe injuries due to high-energy impacts. This study develops a comprehensive analytical framework to identify and quantify risk factors contributing to crash severity in tree-involved collisions using the Crash Report Sampling System (CRSS) database spanning 2020-2023. The modeling framework follows a multi-step process. First, a machine learning based classification model (CatBoost) identifies key factors associated with binary crash injury severity (KA: fatal or incapacitating injury versus BC: non-incapacitating or possible injury). Second, SHapley Additive exPlanations (SHAP) tool is used to quantify and visualize the marginal effects of top influential factors on crash severity. Third, a binary logistic regression model estimates factor effects and validates SHAP-derived importance measures. Finally, SHAP interaction plots examine the combined effects of key contributing factors. Results reveal restraint non-use as the most influential predictor, with unrestrained occupants nearly three times more likely to experience severe outcomes due to ejection risk. Vehicle age, speeding violations, and driver impairment demonstrate substantial effects, reflecting reduced crashworthiness, increased impact forces, and reduced control capabilities. Critical interactions emerge between lighting conditions and vehicle age, speeding and lighting conditions, restraint use and vehicle age, and road surface and speeding, demonstrating additive risk effects with specific interactions. These findings provide critical insights for targeted safe system-based interventions, including enhanced seat belt enforcement, speed management in reduced visibility conditions, and vehicle fleet modernization.

2605.06683 2026-05-11 cs.LG cs.AI cs.CL

Toeplitz MLP Mixers are Low Complexity, Information-Rich Sequence Models

Benjamin L. Badger, Ethan Roland

AI总结 本文提出了一种名为Toeplitz MLP Mixer(TMM)的新型序列模型架构,旨在解决基于Transformer的大语言模型在计算复杂度上的局限性。TMM通过将注意力机制替换为三角掩码的Toeplitz矩阵乘法,实现了更低的训练时间和空间复杂度,同时在推理阶段也表现出更高的效率。实验表明,TMM在信息保留和上下文学习方面优于现有亚二次复杂度模型,且从算子索引理论的角度分析,其因果不可逆模型的Toeplitz层更可能具有可逆性。

详情
英文摘要

Transformer-based large language models are in some respects limited by the quadratic time and space computational complexity of attention. We introduce the Toeplitz MLP Mixer (TMM), a transformer-like architecture that swaps attention for triangular-masked Toeplitz matrix multiplication over the sequence dimension resulting in $\mathcal{O} (dn \log n)$ time and $\mathcal O(dn)$ space complexity during training and $\mathcal O(dn)$ time and space at inference prefill. Despite the lack of sophisticated input modulation or state maintenance present in other sub-quadratic architectures, TMMs yield greater training efficiency in terms of loss achieved per compute and device memory. We demonstrate that TMMs are capable of retaining more input information resulting in improved copying ability, which we argue results from a lack of architectural biases. Consistent with higher input information retention, TMMs exhibit superior information retrieval and in-context learning benchmark accuracy compared to comparable architectures. We conclude with an analysis from the perspective of operator index theory and show that, counterintuitively, trained Toeplitz layers of causal non-invertible models are more likely to be invertible or nearly so than models that are actually invertible over their inputs.

2605.06682 2026-05-11 cs.AI cs.CY

Fast and Effective Redistricting Optimization via Composite-Move Tabu Search

Hai Jin, Diansheng Guo

AI总结 本文提出了一种基于复合移动的禁忌搜索算法(CM-Tabu),用于解决空间选区划分中的优化问题。该方法通过系统性地扩展可行邻域空间,在保证选区连通性的前提下提升搜索效率与解的质量。实验表明,与传统禁忌搜索及其他基准方法相比,CM-Tabu 在解的质量、鲁棒性和计算效率方面均有显著提升,尤其在费城案例中能够稳定达到人口均衡的理论最优解,并支持多目标权衡。

详情
英文摘要

Spatial redistricting is a practical combinatorial optimization problem that demands high-quality solutions, rapid turnaround, and flexibility to accommodate multi-criteria objectives and interactive refinement. A central challenge is the contiguity constraint: enforcing contiguity in integer-programming or heuristic search can severely shrink the feasible neighborhood, weaken exploration, and trap the search in poor local optima. We introduce a composite-move Tabu search (CM-Tabu) that systematically expands the feasible neighborhood space in Tabu search while preserving contiguity. When a boundary unit cannot be reassigned individually without disconnecting its district, our method identifies a minimal set of units that can move together, or a pair of units (or sets of units) that can be switched, as a contiguity-preserving composite move. Candidate single-unit and composite moves are generated in linear time by analyzing each district's contiguity graph using articulation points and biconnected components. Extensive experiments demonstrate that the proposed approach substantially improves solution quality, run-to-run robustness, and computational efficiency relative to traditional Tabu search and other baselines. For example, in the Philadelphia case, the approach can consistently attain the theoretical global optimum in population-equality and support multi-criteria trade-offs. CM-Tabu delivers optimization performance suitable for real-world practices and decision-support workflows.

2605.06680 2026-05-11 cs.LG cs.CV physics.flu-dyn

On the Role of Strain and Vorticity in Numerical Integration Error for Flow Matching

Chenxi Tao, Seung-Kyum Choi

AI总结 本文研究了流匹配中速度场的应变和涡度对数值积分误差的影响,通过将速度场雅可比矩阵分解为对称部分(应变率)和反对称部分(涡度),揭示了两者在误差传播中的不同作用机制。研究证明,应变主导指数级误差放大,而涡度仅对局部截断误差有线性贡献,并据此提出了一种基于加权雅可比正则化的优化方法,实验表明该方法在降低积分误差和提升生成质量方面具有显著效果。

Comments 16 pages, 7 figures. Preliminary version. Includes qualitative CIFAR-10 comparison and supporting synthetic experiments

详情
英文摘要

Flow matching generates data by integrating a learned velocity field, where the number of integration steps (NFE) directly determines inference cost. We analyze which properties of the velocity field govern integration error by decomposing the velocity Jacobian into its symmetric part S (strain rate) and antisymmetric part Omega (vorticity). We prove that strain and vorticity play different roles: strain controls exponential error amplification through the logarithmic norm, while vorticity contributes only linearly to the local truncation error. We further show that the optimal transport velocity field is irrotational and has zero material derivative, implying second-order Euler accuracy; for exact displacement interpolation, the associated Lagrangian particle dynamics are integrated exactly by Euler. Motivated by this analysis, we study weighted Jacobian regularization with strain weight alpha and vorticity weight beta. Experiments on 2D synthetic data confirm the main theoretical predictions, showing up to 2.7x lower integration error at NFE=5. Preliminary CIFAR-10 experiments show consistent trends, with a lightweight fine-tuning procedure improving FID by 14 percent at NFE=10 while preserving high-NFE quality.

2605.06679 2026-05-11 cs.LG

Breaking the Illusion: When Positive Meets Negative in Multimodal Decoding

Yubo Jiang, Yitong An, Xin Yang, Abudukelimu Wuerkaixi, Xuxin Cheng, Fengying Xie, Zhiguo Jiang, Cao Liu, Ke Zeng, Haopeng Zhang

AI总结 视觉语言模型(VLMs)常因过度依赖语言先验而产生与视觉内容不符的幻觉。本文提出了一种无需训练的推理框架——正负解码(PND),通过在解码过程中引入正负两条路径,分别增强视觉证据和构建反事实以抑制语言主导生成,从而提升视觉真实性。实验表明,PND在多个基准测试中取得了最先进的性能。

Comments Accepted by CVPR 2026 (Conference on Computer Vision and Pattern Recognition). 11 pages, 5 figures. Code available at: https://github.com/JiangYubo4399/PND

详情
英文摘要

Vision-Language Models (VLMs) are frequently undermined by object hallucination, generating content that contradicts visual reality, due to an over-reliance on linguistic priors. We introduce Positive-and-Negative Decoding (PND), a training-free inference framework that intervenes directly in the decoding process to enforce visual fidelity. PND is motivated by our finding of an attention imbalance in VLMs, where visual features are under-weighted. Our framework introduces a dual-path contrast: a positive path that amplifies visual evidence and a negative path that constructs counterfactuals to penalize prior-dominant generation. By contrasting outputs from both paths during decoding, PND steers generation toward visually grounded results. Experiments on POPE, MME, and CHAIR demonstrate state-of-the-art performance without retraining.

2605.06678 2026-05-11 cs.LG q-fin.RM stat.AP

A Wasserstein GAN-based climate scenario generator for risk management and insurance: the case of soil subsidence

Antoine Heranval, Olivier Lopez, Didier Ngatcha, Daniel Nkameni

AI总结 本文提出了一种基于Wasserstein GAN的气候情景生成框架SwiGAN,用于生成未来气候指数的时空演变轨迹,以支持风险管理与保险策略制定。该方法聚焦于法国用于评估干旱程度的关键指标——土壤湿润指数(SWI),并模拟其到2050年的可能演变路径,帮助理解气候变化下的干旱动态。该模型不仅有助于制定适应性风险应对策略,还可推广至其他气候相关风险及精算应用。

详情
英文摘要

According to the United Nations Office for Disaster Risk Reduction (2025), the average annual cost of natural catastrophes increased from 70--80 billion USD between 1970 and 2000 to 180--200 billion USD between 2001 and 2020. Reports from organizations such as the IFOA and the WWF highlight the need for the insurance sector to adapt to this rapidly evolving context by developing medium- to long-term strategies that go beyond the one-year horizon of prudential regulations such as Solvency II. This paper introduces an artificial intelligence framework based on Conditional Generative Adversarial Networks (Conditional GANs) to generate future spatio-temporal trajectories of climatic indices. The approach focuses on the Soil Wetness Index (SWI), a key indicator used in France to assess drought severity. Drought accounts for approximately 30% of the indemnities paid under the French natural catastrophe insurance scheme. The proposed model, SwiGAN, simulates plausible drought propagation patterns up to 2050 for a region of France particularly exposed to this hazard. By generating realistic sequences of SWI maps, SwiGAN provides insights into drought dynamics under climate change scenarios and supports the design of adaptive risk management and insurance strategies. The methodology is also generalizable to other climate-related perils and actuarial applications such as economic scenario generation.

2605.06676 2026-05-11 cs.LG cs.CL

LKV: End-to-End Learning of Head-wise Budgets and Token Selection for LLM KV Cache Eviction

Enshuai Zhou, Yifan Hao, Chao Wang, Rui Zhang, Di Huang, Jiaming Guo, Xing Hu, Zidong Du, Qi Guo, Yunji Chen

AI总结 大型语言模型(LLM)在长文本推理时面临键值(KV)缓存内存线性增长的瓶颈。现有方法多依赖启发式策略进行缓存压缩,难以与任务目标对齐。本文提出LKV方法,将KV缓存压缩建模为端到端可微优化问题,通过学习任务导向的全局预算分配和关键值重要性评估,有效提升了压缩效率与推理质量。实验表明,LKV在多个基准测试中实现了领先的压缩性能,尤其在保留仅15% KV缓存时仍能保持接近无损的效果。

详情
英文摘要

Long-context inference in Large Language Models (LLMs) is bottlenecked by the linear growth of Key-Value (KV) cache memory. Existing KV cache compression paradigms are fundamentally limited by heuristics: heuristic budgeting relies on statistical priors rather than task objectives, causing resource misallocation, while heuristic selection relies on coupled query-key interactions or static inductive biases (e.g., attention sinks). To address this limitation, we introduce LKV (Learned KV Eviction), which formulates KV compression as an end-to-end differentiable optimization problem. LKV integrates LKV-H to learn task-optimized global budgets, and LKV-T to derive intrinsic KV importance without materializing attention matrices. This design bypasses heuristic proxies, strictly aligning compression with task objectives. Extensive evaluations demonstrate that LKV achieves state-of-the-art performance on both LongBench and RULER benchmarks at high compression rates. In particular, on LongBench, LKV achieves near-lossless performance with only 15\% KV cache retention. Crucially, our analysis identifies learned budgeting as the dominant driver of fidelity, demonstrating that data-driven allocation is essential to overcome the limitations of hand-crafted heuristics.

2605.06675 2026-05-11 cs.LG cs.CL cs.IT math.IT

RateQuant: Optimal Mixed-Precision KV Cache Quantization via Rate-Distortion Theory

Fei Zuo, Zikang Zhou, Hao Cong, Xiaoyan Xi, Ho Fai Leung

AI总结 大型语言模型在生成过程中需要缓存所有已计算的键值(KV)对,随着序列长度增加,KV缓存占用的内存迅速增长,成为服务阶段的主要瓶颈。现有方法通常采用固定精度对所有注意力头进行量化,忽略了不同头之间重要性的差异。本文提出RateQuant方法,基于率失真理论,通过校准每个量化器的失真模型,并采用逆水填充算法实现最优的混合精度分配,有效提升了量化性能,显著降低了模型的困惑度,且校准过程高效,推理时无额外开销。

Comments 18 pages, 7 figures, 5 tables

详情
英文摘要

Large language models cache all previously computed key-value (KV) pairs during generation, and this KV cache grows linearly with sequence length, making it a primary memory bottleneck for serving. Quantizing the KV cache to fewer bits reduces this cost, yet all current quantizers assign the same bit-width to every attention head, ignoring the large variation in head importance. A natural idea is to allocate more bits to important heads and fewer to the rest. We show, however, that such mixed-precision allocation has a hidden pitfall: each quantizer follows a different distortion curve D(b)=alpha*beta^{-b}, and the decay rate beta varies from 3.6 to 5.3 across quantizer designs. Applying one quantizer's distortion model to another inverts the allocation order and makes performance worse than uniform quantization. We call this failure mode distortion model mismatch and propose RateQuant to resolve it. RateQuant fits a per-quantizer distortion model from a small calibration set, then solves the resulting bit-allocation problem in closed form via reverse waterfilling from rate-distortion theory. On Qwen3-8B at 2.5 average bits, calibrated RateQuant reduces KIVI's perplexity from 49.3 to 14.9 (70% reduction) and improves QuaRot by 6.6 PPL. The entire calibration takes 1.6 s on a single GPU and adds zero overhead at inference time.

2605.06673 2026-05-11 cs.CL cs.AI cs.LG

Domain-level metacognitive monitoring in frontier LLMs: A 33-model atlas

Jon-Paul Cacioli

AI总结 该研究分析了33个前沿大语言模型在MMLU基准不同领域中的元认知监控能力,发现模型在整体表现良好的情况下,其在各个领域的监控能力存在显著差异。研究通过计算每个模型-领域组合的Type-2 AUROC指标,揭示了应用/专业知识领域最易监控,而形式推理和自然科学领域最难监控,并指出模型家族内部的监控能力分布具有显著的聚类特征。研究结果表明,聚合指标可能掩盖了模型在不同应用领域中的实际性能差异,强调了在部署前进行领域级评估的重要性。

Comments 25 pages, 7 figures, 1 supplementary table. Code and data: https://github.com/synthiumjp/metacognitive-profile-atlas

详情
英文摘要

Aggregate metacognitive quality scores mask within-model variation across MMLU benchmark domains. We administered 1,500 MMLU items (250 per domain, under an a priori six-domain grouping) to 33 frontier LLMs from eight model families and computed Type-2 AUROC per model-domain cell using verbalized confidence (0-100). Total observations: 47,151. Every model with above-chance aggregate monitoring showed non-trivial domain-level variation. Applied/Professional knowledge was reliably the easiest benchmark domain to monitor (mean AUROC = .742, ranked top-2 in 21 of 33 models); Formal Reasoning and Natural Science were reliably the hardest (one of the two ranked bottom-2 in 27 of 33 models). The three middle domains were statistically indistinguishable (Kendall's W = .164). A subject-level coherence analysis (within-domain similarity ratio = 0.95) confirms the six-domain grouping is a pragmatic benchmark taxonomy, not a validated latent construct. Within-family profile-shape clustering is significant for Anthropic, Google-Gemini, and Qwen (permutation p < .0001) but not DeepSeek, Google-Gemma, or OpenAI. Gemma 4 31B showed a +.202 AUROC improvement over Gemma 3 27B. Three models classified Invalid on binary KEEP/WITHDRAW probes produced normal profiles under verbalized confidence, confirming probe-format specificity. Bootstrap 95% CIs on 198 cells have median width .199. Split-half aggregate stability r = .893; profile-level split-half is weaker (grand median r = .184). These results show stable benchmark-domain variation obscured by aggregate metrics, and support benchmark-stage domain screening as a step before deployment in specific application areas.

2605.06672 2026-05-11 cs.AI cs.CL cs.LG

More Thinking, More Bias: Length-Driven Position Bias in Reasoning Models

Xiao Wang

AI总结 该研究探讨了推理模型在多选题问答任务中的位置偏差问题,发现即使具备推理能力的模型,其位置偏差程度也会随着推理轨迹长度的增加而上升。研究通过多个模型和数据集的实验证明,推理轨迹越长,模型越容易表现出对选项位置的偏好,这一现象在不同规模的模型中普遍存在。研究还指出,直接答案的位置偏差与推理轨迹长度无关,而推理过程会引入另一种与轨迹长度相关的偏差,并提出了用于检测和评估位置偏差的诊断工具。

详情
英文摘要

Chain-of-thought (CoT) reasoning and reasoning-tuned models such as DeepSeek-R1 are commonly assumed to reduce shallow heuristic biases by thinking carefully. We test this on position bias in multiple-choice QA and find a different story: within any reasoning-capable model, per-question position bias scales with the length of the reasoning trajectory. Across thirteen reasoning-mode configurations (two R1-distilled 7-8B models, two base models prompted with CoT, and DeepSeek-R1 at 671B) on MMLU, ARC-Challenge, and GPQA, twelve show a positive partial correlation between trajectory length and Position Bias Score (PBS) after controlling for accuracy, ranging from 0.11 to 0.41 (all p < 0.05). All twelve open-weight reasoning-mode configurations show monotonically increasing PBS across length quartiles. A truncation intervention provides causal evidence: continuations resumed from later points in the trajectory are increasingly likely to shift toward position-preferred options (16% to 32% for R1-Qwen-7B across absolute-position buckets). At 671B, aggregate PBS collapses to 0.019, but the length effect still manifests in the longest quartile (PBS = 0.071), suggesting that accuracy gates the expression of length-driven bias rather than eliminating the underlying mechanism. We additionally find that direct-answer position bias is a distinct phenomenon with a different footprint (strong in Llama-Instruct-direct, weak in Qwen-Instruct-direct, and uncorrelated with trajectory length): CoT reasoning replaces this baseline bias with length-accumulated bias. Our results argue that reasoning-capable models should not be treated as order-robust by default in MCQ evaluation pipelines, and offer a diagnostic toolkit (PBS, commitment change point, effective switching, truncation probes) for auditing position bias in reasoning models.

2605.06671 2026-05-11 cs.AI cs.MA

GraphDC: A Divide-and-Conquer Multi-Agent System for Scalable Graph Algorithm Reasoning

Wenjin Li, Jiaming Cui

AI总结 本文提出了一种名为GraphDC的分而治之多智能体系统,用于解决大规模图算法推理任务。该方法通过将输入图分解为子图,并由专门的智能体进行局部推理,再由主智能体整合结果,从而降低单个智能体的推理负担,提升计算效率和鲁棒性。实验表明,GraphDC在多种图算法任务中表现优于现有方法,尤其在处理大规模图实例时具有显著优势。

详情
英文摘要

Large Language Models (LLMs) have demonstrated strong potential for many mathematical problems. However, their performance on graph algorithmic tasks is still unsatisfying, since graphs are naturally more complex in topology and often require systematic multi-step reasoning, especially on larger graphs. Motivated by this gap, we propose GraphDC, a Divide-and-Conquer multi-agent framework for scalable graph algorithm reasoning. Specifically, inspired by Divide-and-Conquer design, GraphDC decomposes an input graph into smaller subgraphs, assigns each subgraph to a specialized agent for local reasoning, and uses a master agent to integrate the local outputs with inter-subgraph information to produce the final solution. This hierarchical design reduces the reasoning burden on individual agents, alleviates computational bottlenecks, and improves robustness on large graph instances. Extensive experiments show that GraphDC consistently outperforms existing methods on graph algorithm reasoning across diverse tasks and scales, especially on larger instances where direct end-to-end reasoning is less reliable.

2605.06623 2026-05-11 cs.AI cs.CL cs.LG cs.MA

MASPO: Joint Prompt Optimization for LLM-based Multi-Agent Systems

Zhexuan Wang, Xuebo Liu, Li Wang, Zifei Shan, Yutong Wang, Zhenxi Song, Min Zhang

AI总结 本文提出了一种名为MASPO的新框架,用于联合优化基于大语言模型的多智能体系统中的角色提示,以提升系统整体协作性能。MASPO通过引入联合评估机制,从全局系统目标出发优化各智能体的局部提示,避免了传统方法中局部目标与整体目标不一致的问题。此外,MASPO采用数据驱动的进化光束搜索策略高效探索高维提示空间,实验表明其在多个任务中优于现有方法,平均准确率提升了2.9个百分点。

Comments Accepted at ICML 2026

详情
英文摘要

Large language model (LLM)-based Multi-agent systems (MAS) have shown promise in tackling complex collaborative tasks, where agents are typically orchestrated via role-specific prompts. While the quality of these prompts is pivotal, jointly optimizing them across interacting agents remains a non-trivial challenge, primarily due to the misalignment between local agent objectives and holistic system goals. To address this, we introduce MASPO, a novel framework designed to automatically and iteratively refine prompts across the entire system. A core innovation of MASPO is its joint evaluation mechanism, which assesses prompts not merely by their local validity, but by their capacity to facilitate downstream success for successor agents. This effectively bridges the gap between local interactions and global outcomes without relying on ground-truth labels. Furthermore, MASPO employs a data-driven evolutionary beam search to efficiently navigate the high-dimensional prompt space. Extensive empirical evaluations across 6 diverse tasks demonstrate that MASPO consistently outperforms state-of-the-art prompt optimization methods, achieving an average accuracy improvement of 2.9. We release our code at https://github.com/wangzx1219/MASPO.

2605.06435 2026-05-11 cs.CL cs.AI cs.LG

COVID-19 Infodemic. Understanding content features in detecting fake news using a machine learning approach

Vimala Balakrishnan, Lee Zing Hii, Eric Laporte

AI总结 本文研究了利用文本和语言特征检测假新闻的问题,采用传统机器学习方法,选取词双 grams、词性分布等特征进行实验。实验基于新冠疫情时期收集的数据集,结果显示随机森林和支持向量机在检测效果上表现最佳,且单独使用文本或语言特征即可提升检测性能,但两者结合并未显著提高效果。研究证明,在不依赖深度学习的情况下,传统机器学习方法也能有效利用文本和语言特征进行假新闻识别。

详情
Journal ref
Malaysian Journal of Computer Science, 2023, 36 (1), pp.1-13
英文摘要

The use of content features, particularly textual and linguistic for fake news detection is under-researched, despite empirical evidence showing the features could contribute to differentiating real and fake news. To this end, this study investigates a selection of content features such as word bigrams, part of speech distribution etc. to improve fake news detection. We performed a series of experiments on a new dataset gathered during the COVID-19 pandemic and using Decision Tree, K-Nearest Neighbor, Logistic Regression, Support Vector Machine and Random Forest. Random Forest yielded the best results, followed closely by Support Vector Machine, across all setups. In general, both the textual and linguistic features were found to improve fake news detection when used separately, however, combining them into a single model did not improve the detection significantly. Differences were also noted between the use of bigrams and part of speech tags. The study shows that textual and linguistic features can be used successfully in detecting fake news using the traditional machine learning approach as opposed to deep learning.

2605.06298 2026-05-11 cs.CV cs.AI

Render, Don't Decode: Weight-Space World Models with Latent Structural Disentanglement

Roussel Desmond Nzoyem, Mauro Comi

AI总结 本文提出了一种名为NOVA的世界模型框架,通过将系统状态表示为辅助坐标隐式神经表示(INR)的权重和偏置,解决了传统世界模型依赖复杂解码器和不可解释潜空间的问题。该方法通过解析渲染结构化表示,消除了解码瓶颈,实现了模型的紧凑性、可移植性和零样本超分辨率。实验表明,NOVA能够在单块消费级GPU上高效运行,实现了可控的未来预测,并能分离场景中的结构组件,如背景、前景和运动,从而支持内容与动态的独立编辑。

Comments 35 pages, 30 figures, 8 tables

详情
英文摘要

Training world models on vast quantities of unlabelled videos is a critical step toward fully autonomous intelligence. However, the prevailing paradigm of encoding raw pixels into opaque latent spaces and relying on heavy decoders for reconstruction leaves these models computationally expensive and uninterpretable. We address this problem by introducing NOVA, a world modelling framework that represents the system state as the weights and biases of an auxiliary coordinate-based implicit neural representation (INR). This structured representation is analytically rendered, which eliminates the decoder bottleneck while conferring compactness, portability, and zero-shot super-resolution. Furthermore, like most latent action models, NOVA can be distilled into a context-dependent video generator via an action-matching objective. Surprisingly, without resorting to auxiliary losses or adversarial objectives, NOVA can disentangle structural scene components such as background, foreground, and inter-frame motion, enabling users to edit either content or dynamics without compromising the other. We validate our framework on several challenging datasets, achieving strong controllable forecasting while operating on a single consumer GPU at $\sim$40M parameters. Ultimately, structured representations like INRs not only enhance our understanding of latent dynamics but also pave the way for immersive and customisable virtual experiences.

2605.06230 2026-05-11 cs.AI cs.DC

Safactory: A Scalable Agentic Infrastructure for Training Trustworthy Autonomous Intelligence

Xinquan Chen, Zhenyun Yin, Shan He, Bin Huang, Shanzhe Lei, Pengcheng Shi, Kun Cai, Bei Chen, Bangwei Liu, Zeyu Kang, Chao Huang, Yang Zhang, Wenjie Li, Ruijun Ge, Yajie Wang, Tianshun Fang, Tianyang Xu, Yiwen Cong, Meng Jin, Gaolei Li, Xuansheng Wu, Linhan Liu, Zijing He, An Li, Yan Teng, Xin Tan, Dongrui Liu, Jing Shao, ChaoChao Lu, Ji He, Jie Li, Chunfeng Song, Jinya Xu, Fan Song, Shujie Wang, Jianmin Qian, Jie Hou, Xuhong Wang, Yingchun Wang, Hui Wang, Xia Hu

AI总结 随着大模型从对话助手演变为自主智能体,长期决策、工具使用和真实环境交互带来的挑战日益凸显。现有智能体基础设施在评估、数据管理和智能体进化方面较为分散,难以系统发现风险并实现持续闭环优化。本文提出 **Safactory**,一个可扩展的智能体工厂框架,集成轨迹生成、可信数据管理与自主进化的三大平台,构建统一的进化流程,为下一代可信自主智能体提供了全新的基础设施。

Comments 50 pages, 21 figures

详情
英文摘要

As large models evolve from conversational assistants into autonomous agents, challenges increasingly arise from long-horizon decision making, tool use, and real environment interaction. Existing agenticinfrastructure remain fragmented across evaluation, data management, and agent evolution, making it difficult to discover risks systematically and improve models in a continuous closed loop. In this report, we present \textbf{Safactory}, a scalable agent factory for trustworthy autonomous intelligence. Safactory integrates three tightly coupled platforms: a \textbf{Parallel Simulation Platform} for trajectory generation, a \textbf{Trustworthy Data Platform} for trajectory storage and experience extraction, and an \textbf{Autonomous Evolution Platform} for asynchronous reinforcement learning and on-policy distillation. As far as we know, Safactory is the first framework to propose a unified evolutionary pipeline for next-generation trustworthy autonomous intelligence.

2605.06175 2026-05-11 cs.RO

VLA-GSE: Boosting Parameter-Efficient Fine-Tuning in VLA with Generalized and Specialized Experts

Yuhua Jiang, Junjie Lu, Xinyao Qin, Xiaoyu Chen, Kaixin Wang, Feifei Gao, Li Zhao

AI总结 本文提出了一种名为VLA-GSE的参数高效的视觉-语言-动作(VLA)模型微调框架,旨在解决将预训练VLA模型适配到机器人控制任务中的挑战。该方法通过谱分解冻结的主干网络,将主导奇异值分量分配给通用专家(共享专家),而离散残差分量则分配给专用专家(路由专家),从而在固定可训练参数预算下提升模型的适应能力。实验表明,VLA-GSE在保持预训练知识的同时,仅更新全模型2.51%的参数,显著优于全微调和现有PEFT方法,在多个基准测试中表现出优异的零样本控制和多模态理解性能。

详情
英文摘要

Vision-language-action (VLA) models inherit rich visual-semantic priors from pre-trained vision-language backbones, but adapting them to robotic control remains challenging. Full fine-tuning (FFT) is prone to overfitting on downstream robotic data and catastrophic forgetting of pretrained vision-language capabilities. Parameter-efficient fine-tuning (PEFT) better preserves pre-trained knowledge, yet existing PEFT methods still struggle to adapt effectively to robot control tasks. To address this gap, we propose VLA-GSE, a parameter-efficient VLA fine-tuning framework that improves control adaptation while retaining PEFT's knowledge preservation advantage. Specifically, VLA-GSE (Generalized and Specialized Experts) is initialized by spectrally decomposing the frozen backbone, assigning leading singular components to generalized experts (shared experts) and disjoint residual components to specialized experts (routed experts). This decomposition improves adaptation capacity under a fixed trainable-parameter budget. Under a comparable parameter budget, VLA-GSE updates only 2.51% of the full model parameters and consistently outperforms strong FFT and PEFT baselines. It achieves 81.2% average zero-shot success on LIBERO-Plus, preserves pre-trained VLM capability comparably to LoRA on multimodal understanding benchmarks, and improves real-world manipulation success under multiple distribution shifts. Code is available at: https://github.com/YuhuaJiang2002/VLA-GSE

2605.06169 2026-05-11 cs.LG cs.CV

Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers

Pengqi Lu

AI总结 本文研究了扩散变换器(DiT)在扩展到数百层时出现的结构脆弱性问题,即网络可能陷入均值主导的崩溃状态,导致特征表示趋同、变化被抑制。通过机制分析,作者发现了触发这一崩溃现象的机制——“均值模式尖叫”(MMS),并提出了一种名为“均值-方差分割残差”(MV-Split Residuals)的新方法,通过分离均值和方差梯度更新,有效防止了深层网络的崩溃,验证了该方法在1000层DiT模型中的稳定训练能力。

Comments 43 pages (9-page main paper + appendix)

详情
英文摘要

Scaling Diffusion Transformers (DiTs) to hundreds of layers introduces a structural vulnerability: networks can enter a silent, mean-dominated collapse state that homogenizes token representations and suppresses centered variation. Through mechanistic auditing, we isolate the trigger event of this collapse as Mean Mode Screaming (MMS). MMS can occur even when training appears stable, with a mean-coherent backward shock on residual writers that opens deep residual branches and drives the network into a mean-dominated state. We show this behavior is driven by an exact decomposition of these gradients into mean-coherent and centered components, compounded by the structural suppression of attention-logit gradients through the null space of the Softmax Jacobian once values homogenize. To address this, we propose Mean-Variance Split (MV-Split) Residuals, which combine a separately gained centered residual update with a leaky trunk-mean replacement. On a 400-layer single-stream DiT, MV-Split prevents the divergent collapse that crashes the un-stabilized baseline; it tracks close to the baseline's pre-crash trajectory while remaining substantially better than token-isotropic gating methods such as LayerScale across the full schedule. Finally, we present a 1000-layer DiT as a scale-validation run at boundary scales, establishing that the architecture remains stably trainable at extreme depth.