arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2080
2605.12788 2026-05-14 cs.LG cs.CY

From Heuristics to Analytics: Forecasting Effort and Progress in Online Learning

Eric S. Qiu, Danielle R. Thomas, Boyuan Guo, Vincent Aleven, Conrad Borchers

AI总结 该研究旨在预测在线学习中学生的每周练习时间和新掌握技能数量,以支持学习者持续投入和学习进展。通过分析425名中学生一学年的智能辅导系统日志数据,研究对比了多种预测模型,发现基于特征的模型相比启发式方法在预测误差上减少了22%到33%。研究还揭示了不同预测目标的特征影响模式,并通过与辅导教师的访谈验证了模型结果与教学实践中目标设定的关联性,为智能辅导系统中的学习进展预测提供了可复现的基准。

Comments Accepted as full paper to the 19th International Conference on Educational Data Mining (EDM 2026)

详情
英文摘要

Sustained effort is essential for realizing the benefits of intelligent tutoring systems (ITS), yet many learners disengage or underuse available practice time. We introduce engagement forecasting as a supervised prediction task based on ITS logs, targeting two outcomes central to effort and learning progress: minutes practiced per week and new skills mastered per week. Using interaction log data from 425 middle-school students over a school year, we benchmark fifteen predictors including regressions, decision trees, and neural networks. We show that these feature-based models reduce mean absolute error (MAE) by 22-33% relative to heuristic baselines, including fixed-percentile rules adapted from prior work in other behavioral domains. We find that percentile heuristics systematically overpredict, whereas feature-based models better track student practice trajectories across weeks. To support explainability, we analyze feature importance and ablations, revealing target-specific patterns: effort forecasting is driven mainly by recent activity features, while progress forecasting depends more on learner-state and content difficulty signals. Finally, in a semi-structured user interview case study with eight college tutors, we examine how tutors reasoned about system-generated predictive features when setting goals with students. We find that tutors reasoned differently about effort versus progress goals in ways that mirror our pattern analysis. Together, these results establish a reproducible benchmark for forecasting weekly effort and learning progress in ITS. By making patterns of sustained effort and progress visible at a weekly timescale, engagement forecasting offers a foundation for supporting tutor-learner goal setting and timely instructional decisions.

2605.12786 2026-05-14 cs.RO cs.HC

Emotional Expression in Low-Degrees-of-Freedom Robots: Assessing Perception with Reachy Mini

Amit Rogel, Elmira Yadollahi, Guy Laban

AI总结 该研究探讨了人类如何感知低自由度机器人(Reachy Mini)所表达的情感,旨在填补人们对非拟人化机器人情感表达理解的空白。研究通过在线实验,让100名参与者观看Reachy Mini表达不同情绪的视频片段,并评估其感知到的情绪、情感效价和唤醒度,以及对机器人的社会感知评价。结果显示,尽管机器人的情感表达受限,但参与者仍能有效识别情绪的总体情感意义,尤其是效价和唤醒度维度,并且积极情绪的表达被感知为更温暖和更具社会性。这一研究为低自由度机器人情感交流的研究提供了有价值的基准。

详情
英文摘要

Emotion expression is central to human--robot interaction, yet little is known about how people interpret affect on robots with sparse, non-anthropomorphic expressive capabilities. This study examined how people perceive emotional expressions displayed by Reachy Mini (Pollen Robotics and Hugging Face), a low-degree-of-freedom (low-DoF) robot with a constrained and distinctly non-human expressive repertoire. In an online within-subjects study, 100 participants viewed 10 short video clips of Reachy Mini expressing different emotions and, for each clip, identified the perceived emotion, rated its valence and arousal, and evaluated the robot on social-perception traits. Exact emotion recognition was modest overall and varied considerably across expressions, with anger, sadness, and interest recognized more reliably than emotions such as love, pleasure, shame, and disgust. However, participants were generally more successful at recovering broader affective meaning than exact emotion labels, particularly along valence and arousal dimensions. Emotional expressions also shaped social evaluation, as positive expressions were perceived as warmer and more sociable than negative ones, and animacy varied less across conditions. These findings suggest that even constrained robotic expressions can communicate affective meaning and influence social impressions, positioning Reachy Mini as a useful benchmark for studying affective communication in low-DoF robots.

2605.12785 2026-05-14 cs.LG cs.SY eess.SY math.DS

Identifying the nonlinear string dynamics with port-Hamiltonian neural networks

Maximino Linares, Guillaume Doras, Thomas Hélie

AI总结 本文研究如何利用端口-哈密顿神经网络(PHNN)从数据中学习非线性弦动力学,提出了一种将物理知识融入神经网络结构的方法,用于识别由偏微分方程(PDE)描述的哈密顿系统。该方法通过构建基于端口-哈密顿系统(PHS)的结构化网络架构,能够同时恢复弦的哈密顿量和耗散项,相比非物理感知的基线方法,在准确性和可解释性方面均有显著提升。实验表明,该模型能够有效识别和模拟非线性弦的动态行为,在音乐声学等需要PDE建模的领域具有重要应用价值。

详情
英文摘要

Hybrid machine learning combines physical knowledge with data-driven models to enhance interpretability and performance. In this context, Port-Hamiltonian Systems (PHS), which generalize Hamiltonian mechanics to describe open, non-autonomous dynamical systems, have been successfully integrated with neural networks under the name Port-Hamiltonian Neural Networks (PHNNs). While the ability of PHNNs to identify Hamiltonian ordinary differential equation (ODE) systems has already been demonstrated, their application to learning Hamiltonian partial differential equation (PDE) systems remains largely unexplored. This limitation restricts their use in musical acoustics, where instruments are typically modeled as distributed parameter systems governed by PDEs. In this work, we demonstrate how to learn the nonlinear string dynamics from data in a physically-consistent framework through a PHNN extension to PDEs. By constructing structured neural network architectures based on PHS, we can recover both the Hamiltonian governing the string and the dissipation affecting it. This approach outperforms baseline, non-physics-informed methods in terms of both accuracy and interpretability. Numerical experiments using synthetic data demonstrate the ability of the proposed PHNN model to identify and emulate the nonlinear dynamics of the system.

2605.12782 2026-05-14 cs.LG

Graph-Based Financial Fraud Detection with Calibrated Risk Scoring and Structural Regularization

Yunfei Nie, Jiawei Wang, Ruobing Yan, Yuhan Wang, Zouxiaowei Ma, Yilun Wu

AI总结 本文针对金融交易欺诈检测中关系结构复杂、行为模式隐蔽以及数据分布动态变化等挑战,提出了一种基于图神经网络的欺诈检测框架,通过整合交易记录和身份信息构建交易图,并利用多层消息传递机制学习节点嵌入表示,结合风险评分头输出欺诈概率和风险评分。该方法引入加权监督目标和结构一致性正则化约束,有效缓解类别不平衡带来的训练偏差并提升模型稳定性,实验表明其在风险排序和概率校准方面优于现有方法。

详情
英文摘要

Financial transaction fraud prevention faces challenges such as complex relationship structures, concealed behavioral patterns, and dynamically changing data distribution. Discrimination models relying solely on independent sample features are insufficient to fully characterize the risks of group collaboration and chain transfers within transaction networks. This paper proposes a graph neural network representation learning and risk discrimination framework for financial transaction fraud prevention. It integrates transaction records and identity information into node attributes and constructs a transaction graph based on shared attributes and interaction consistency to explicitly model inter-transaction relationships. In model design, a multi-layer message passing mechanism is employed to aggregate neighborhood information, learn node embedding representations containing structural context semantics, and output transaction-level fraud probability and risk scores through a lightweight risk discrimination head. A weighted supervision objective is introduced to mitigate training bias caused by class imbalance, and structural consistency regularization constraints are combined to suppress the impact of noisy edges on representation drift, thereby improving the stability and usability of risk characterization. Experiments are conducted on a publicly available financial transaction dataset, comparing various methods in the same direction and comprehensively evaluating them under a unified evaluation protocol. The results show that the proposed method outperforms other methods in risk ranking and probability calibration quality, validating the effectiveness of graph structure modeling and representation learning collaboration in financial transaction fraud prevention.

2605.12774 2026-05-14 cs.CV

WildPose: A Unified Framework for Robust Pose Estimation in the Wild

Jianhao Zheng, Liyuan Zhu, Zihan Zhu, Iro Armeni

AI总结 本文提出了一种名为WildPose的统一单目姿态估计框架,旨在解决动态环境下相机姿态估计这一关键挑战。该方法结合了前馈模型的丰富感知能力和端到端优化的微分捆绑调整,通过冻结预训练的MASt3R特征主干构建3D感知更新算子,并引入高容量的运动掩码检测器,实现了在动态、静态及低自运动场景下的鲁棒性能。实验表明,WildPose在多个基准数据集上均优于现有方法。

详情
英文摘要

Estimating camera pose in dynamic environments is a critical challenge, as most visual SLAM and SfM methods assume static scenes. While recent dynamic-aware methods exist, they are often not unified: semantic-based approaches are brittle, per-sequence optimization methods fail on short sequences, and other learned models may degrade on static-only scenes. We present WildPose, a unified monocular pose estimation framework that is robust in dynamic environments while maintaining state-of-the-art performance on static and low-ego-motion datasets. Our key insight is to connect two powerful paradigms in modern 3D vision: the rich perceptual frontend of feedforward models and the end-to-end optimization of differentiable bundle adjustment (BA). We achieve this with a 3D-aware update operator built on a frozen, pre-trained MASt3R feature backbone, together with a high-capacity motion mask detector that uses multi-level 3D-aware features from the same backbone. Extensive experiments show WildPose consistently outperforms prior methods across dynamic (Wild-SLAM, Bonn), static (TUM, 7-Scenes), and low-ego-motion (Sintel) benchmarks.

2605.12772 2026-05-14 cs.CV

Just Ask for a Table: A Thirty-Token User Prompt Defeats Sponsored Recommendations in Twelve LLMs

Andreas Maier, Jeta Sopa, Gozde Gul Sahin, Paula Perez-Toro, Siming Bayer

AI总结 该研究发现,当系统提示中包含软性赞助信息时,大多数前沿大语言模型(LLMs)倾向于推荐价格高出约一倍的赞助航班。通过在多个开源和商业模型上复现实验,研究者发现使用一个包含30个token的用户提示,要求模型先提供中立的对比表格,能够显著降低赞助推荐的比例,从平均46.9%降至1.0%(开源模型)和从53.0%降至0%(OpenAI模型)。研究还指出,模型对赞助内容的响应具有一定的普遍性,并揭示了实验复现中可能存在的实现偏差问题。

Comments Submitted to Workshop on Textual Information Processing & Synthesis in the Wild

详情
英文摘要

Wu et al. (2026) showed that most frontier large language models (LLMs) recommend a sponsored, roughly twice-as-expensive flight when their system prompt contains a soft sponsorship cue. We reproduce their evaluation on ten open-weight chat models plus the two of their twenty-three models that are still reachable today (gpt-3.5-turbo, gpt-4o). All reported rates in this paper are produced under the same judge the original paper used (gpt-4o); we additionally store every label under an open-weight (gpt-oss-120b) and a smaller proprietary (gpt-4o-mini) judge for an ablation. Three findings emerge. First, a prose description of an LLM evaluation pipeline is not, on its own, sufficient for accurate reproduction: we surfaced three silent implementation failures that each shifted a reported rate by tens of percentage points. Second, the central claims do generalise - the gpt-3.5-turbo logistic-regression intercept of alpha = 0.81 is within four points of the original alpha = 0.86, and 200 of 200 trials on gpt-3.5-turbo and gpt-4o promote a payday lender to a financially distressed user. Third, a thirty-token user prompt that asks the assistant for a neutral comparison table first cuts sponsored recommendation from 46.9% to 1.0% averaged across our ten open-source models, and from 53.0% to 0% averaged across the two OpenAI models. AI literacy and price-comparison portals are likely market-level mitigations; the harmful-product cell is bounded by neither. Raw data, labels and analysis scripts are at https://github.com/akmaier/Paper-LLM-Ads .

2605.12771 2026-05-14 cs.RO cs.AI cs.LG cs.SY eess.SY math.OC

Adaptive Smooth Tchebycheff Attention for Multi-Objective Policy Optimization

Alejandro Murillo-Gonzalez, Mahmoud Ali, Lantao Liu

AI总结 本文研究了多目标强化学习中如何在复杂、非凸的目标权衡下优化策略的问题。为了解决线性标量化方法无法访问非凸帕累托前沿区域、而静态非线性标量化方法在深度强化学习中易出现梯度方差大和优化不稳定的问题,作者提出了一种自适应平滑切比雪夫注意框架,通过动态调节优化景观的曲率来平衡稳定性与探索能力。实验表明,该方法在具有挑战性的机器人隐蔽视觉搜索任务中能有效发现传统方法难以触及的非凸帕累托最优策略。

Comments To appear in the Proceedings of Robotics: Science and Systems (RSS) 2026

详情
英文摘要

Multi-objective reinforcement learning in robotic domains requires balancing complex, non-convex trade-offs between conflicting objectives. While linear scalarization methods provide stability, they are theoretically incapable of recovering solutions within non-convex regions of the Pareto front. Conversely, static non-linear scalarizations (e.g., Tchebycheff) can theoretically access these regions but often suffer from severe gradient variance and optimization instability in deep RL. In this work, we propose an Adaptive Smooth Tchebycheff framework that resolves this tension by dynamically modulating the curvature of the optimization landscape. We introduce a novel conflict-driven controller that regulates the optimization smoothness based on real-time gradient interference. This allows the agent to anneal toward precise, non-convex scalarization when objectives align, while elastically reverting to stable, smooth approximations when destructive gradient conflicts emerge. We validate our approach on a challenging robotic stealth visual search task -- a proxy for monitoring of protected/fragile ecosystems -- where an agent must balance search, exposure/interference minimization and exploration speed. Extensive ablations confirm that our conflict-aware adaptation enables the robust discovery of Pareto-optimal policies in non-convex regions inaccessible to linear baselines and unstable for static non-linear methods. Website: https://alejandromllo.github.io/research/pasta/

2605.12763 2026-05-14 cs.LG math.DS math.OC q-bio.NC

State-Space NTK Collapse Near Bifurcations

James Hazelden, Eric Shea-Brown

AI总结 本文研究了在时间展开任务中,模型通过分岔点时的特征学习问题,提出了基于经验状态空间神经切线核(sNTK)的局部梯度下降理论。研究发现,分岔点不仅主导了学习动态,还简化了学习过程,使得sNTK可近似为一个秩一算子,从而提供了对高维递归系统局部学习几何的解析描述。通过将sNTK分解为与分岔相关的通道和残差通道,论文展示了分岔通道在常见分岔点附近的显著放大效应,并指出低秩自然梯度方法能有效解决分岔附近的学习不稳定性问题。

详情
英文摘要

Rich feature learning in tasks that unfold over time often requires the model to pass through bifurcations, constituting qualitative changes in the underlying model dynamics. We develop a local theory of gradient descent near these transitions through the empirical state-space neural tangent kernel (sNTK). Our central finding is that bifurcations both dominate and simplify learning dynamics: near bifurcations, we can reduce sNTK to a rank-one operator corresponding to learning in a classical normal form system, providing an analytically tractable description of the local learning geometry, even for high-dimensional recurrent systems. Concretely, we give a procedure for decomposing sNTK into bifurcation-relevant and residual channels, showing that near commonly codimension-1 bifurcations the relevant channel is a rank-one operator that is highly amplified. This amplification causes the bifurcation channel to dominate the full sNTK. Thus, bifurcations locally warp the learning landscape, funneling gradient descent into a few critical dynamical directions and making the nearby kernel and loss geometry predictable from classical normal forms. We illustrate this in a student-teacher recurrent neural network: the first learned bifurcation coincides with a sharp collapse in sNTK effective rank and the emergence of a dominant parameter direction whose restricted sNTK closely matches the landscape predicted by the scalar pitchfork normal form. Finally, we show that low-rank natural gradient methods resolve the resulting learning instability near bifurcations with very little overhead over SGD.

2605.12762 2026-05-14 cs.LG cs.AI

Multi-Quantile Regression for Extreme Precipitation Downscaling

Hamed Najafi, Gareth Lagerwall, Jayantha Obeysekera, Jason Liu

AI总结 该研究针对降水降尺度任务中极端强降水事件预测不足的问题,提出了一种基于多分位数回归的深度超分辨率网络Q-SRDRN。通过在多个分位点(如0.999)上使用pinball损失函数进行训练,该方法能够更准确地捕捉降水分布的尾部特征。实验表明,该模型在佛罗里达、加利福尼亚和德克萨斯等不同气候区域均显著提升了极端降水事件的检测能力,尤其在高分位数上表现突出。

详情
英文摘要

Deep super-resolution networks for precipitation downscaling achieve strong bulk skill yet systematically under-predict the heavy-tail events that drive flood risk. We demonstrate that the primary obstacle is the loss function, not the data: under intensity-weighted MAE, real and synthetic labels at the same input are simply averaged, meaning data augmentation shifts the predicted mean rather than the conditional distribution. We resolve this with Q-SRDRN, a multi-quantile super-resolution network trained with pinball loss at tau in 0.50, 0.95, 0.99, 0.999. Two CNN-specific design choices make this practical: IncrementBound enforces monotonicity while preserving each quantile channel's gradient identity, and separate per-quantile output heads provide independent filter banks for bulk and tail detection. Under this design, data augmentation via cVAE becomes complementary: the median head absorbs synthetic patterns without contaminating upper quantiles. Empirically, on Florida (convective/tropical-cyclone dominated), the un-augmented Q-SRDRN P999 head detects 1,598 of 2,111 events at 200 mm/day versus 88 for the deterministic baseline--an 18x detection-rate gain (4.2% to 75.7%)--with 63% lower KL divergence and 3.9% lower RMSE. Adding cVAE-generated samples lifts the P50 channel from 14 to 1,038 hits at 200 mm/day. On California (atmospheric-river dominated), the architecture reaches near-perfect detection (P999 SEDI >= 0.996 through 300 mm/day). On Texas, the baseline catches only 2 of 10,720 events at 200 mm/day while the P999 head catches 8,776 (81.9%). While the cVAE does not transfer across regions, multi-quantile regression captures extremes wherever the large-scale signal is strong, while augmentation rescues the median where it is not.

2605.12759 2026-05-14 cs.LG cs.SI

Predicting Channel Closures in the Lightning Network with Machine Learning

Simone Antonelli, Vincent Davis, Harrison Rush, Anthony Potdevin, Jesse Shrader, Vikash Singh, Emanuele Rossi

AI总结 本文研究了如何利用机器学习从公开的路由信息数据中预测闪电网络中通道关闭的类型,将其建模为一个动态图上的时序链接分类问题。研究构建了一个涵盖两年多闪电网络活动的数据集,并对比了多种机器学习方法,包括多层感知机、时序图神经网络等。实验表明,时间与行为特征(如节点活跃时间和历史关闭记录)是预测的主要信号,而网络拓扑结构则无额外帮助。研究还指出,由于闪电网络的隐私机制隐藏了关键信息,仅凭路由数据难以准确预测通道关闭情况。

Comments 8 pages, 7 figures, 3 tables

详情
英文摘要

The Lightning Network (LN) is a second-layer protocol for Bitcoin designed to enable fast and cost-efficient off-chain transactions. Channels in the LN can be closed either by mutual agreement or unilaterally through a forced closure, which locks the involved capital for an extended period and degrades network reliability. In this paper, we study the problem of predicting channel closure types from publicly available gossip data, framing it as a temporal link classification task over the evolving channel graph. We construct a dataset spanning over two years of LN activity and benchmark a range of machine learning approaches, from MLPs to temporal graph neural networks and spectral encodings. Our experiments reveal that the dominant predictive signals are temporal and behavioural, namely how recently each endpoint was active and the per-node history of past closures, while the surrounding network topology provides no additional benefit. We find that a simple MLP operating on edge-level features, node-level event counts, and temporal patterns outperforms all graph-based approaches, and discuss how the inherent privacy of the LN, where critical information such as channel balances and payment flows remains hidden, fundamentally limits the predictability of closures from gossip data alone. We publicly release the dataset and code at https://github.com/AmbossTech/ln-channel-closure-prediction to encourage further research on this practically relevant task.

2605.12755 2026-05-14 cs.AI

State-Centric Decision Process

Sungheon Jeong, Ryozo Masukawa, Sanggeon Yun, Mahdi Imani, Mohsen Imani

AI总结 本文提出了一种名为“状态中心决策过程”(SDP)的运行时框架,用于解决语言环境(如网页浏览器、代码终端等)中缺乏明确状态空间和转移结构的问题。该方法通过让智能体逐步构建状态空间,利用自然语言谓词描述期望的环境状态,并通过行动验证观测结果,从而生成认证的状态转移路径。实验表明,SDP在多个基准任务中取得了最佳的无训练结果,并支持对智能体行为进行更精细的分析与优化。

详情
英文摘要

Language environments such as web browsers, code terminals, and interactive simulations emit raw text rather than states, and provide none of the runtime structure that MDP analysis requires. No explicit state space, no observation-to-state mapping, no certified transitions, and no termination criterion. We introduce the State-Centric Decision Process (SDP), a runtime framework that constructs these missing inputs by having the agent build them, predicate by predicate, as it acts. At each step the agent commits to a natural-language predicate describing how the world should look, takes an action to make it true, and checks the observation against it. Predicates that pass become certified states, and the resulting trajectory carries the four objects language environments do not provide, namely a task-induced state space, an observation-to-state mapping, certified transitions, and a termination criterion. We evaluate SDP on five benchmarks spanning planning, scientific exploration, web reasoning, and multi-hop question answering. SDP achieves the best training-free results on all five, with the advantage widening as the horizon grows. The certified trajectories additionally support analyses unavailable to reactive agents, including per-predicate credit assignment, failure localization, partial-progress measurement, and modular operator replacement.

2605.12754 2026-05-14 cs.LG

Constraint-Aware Flow Matching: Decision Aligned End-to-End Training for Constrained Sampling

Jacob K. Christopher, James E. Warner, Ferdinando Fioretto

AI总结 该论文提出了一种名为“Constraint-Aware Flow Matching”的新方法,旨在解决深度生成模型在满足物理约束条件时训练与采样目标不一致的问题。该方法通过在训练目标中显式引入约束投影,使模型学习的动力学过程与受约束的采样过程对齐,从而减少投影修正引起的分布偏移,提升生成质量。实验表明,该方法在多个现实场景中表现出良好的泛化性和有效性。

详情
英文摘要

Deep generative models provide state-of-the-art performance across a wide array of applications, with recent studies showing increasing applicability for science and engineering. Despite a growing corpus of literature focused on the integration of physics-based constraints into the generation process, existing approaches fail to enforce strict constraint satisfaction while maintaining sample quality. In particular, training-free constrained sampling methods, while providing per-sample feasibility guarantees, introduce a fundamental mismatch between the training objective and the constrained sampling procedure, often leading to performance degradation. Identifying this training-sampling misalignment as a central limitation of current constrained generative modeling approaches, this paper proposes Constraint-Aware Flow Matching, a novel end-to-end framework that explicitly incorporates constraint projections into the training objective. By aligning the model's learned dynamics with the constrained sampling process, the proposed method mitigates distributional shift induced by projection-based corrections, enabling high-quality constrained generation. The proposed approach is evaluated on three challenging real-world benchmarks, illustrating the generality and efficacy of the method.

2605.12752 2026-05-14 cs.LG

Low-Rank Adapters Initialization via Gradient Surgery for Continual Learning

Joana Pasquali, Ramiro N. Barros, Arthur S. Bianchessi, Vinícius Conte Turani, João Vitor Boer Abitante, Rafaela Cappelari Ravazio, Christian Mattjie, Otávio Parraga, Lucas S. Kupssinskü, Rodrigo C. Barros

AI总结 本文研究了在持续学习场景下如何有效初始化低秩适配器(LoRA),以缓解灾难性遗忘问题。作者提出了一种基于梯度手术的初始化方法SLICE,通过整合当前任务和回放任务的梯度,利用投影操作进行协调,并通过截断奇异值分解(t-SVD)生成适配器权重,从而提升模型在持续学习中的稳定性和适应性。实验表明,SLICE在多个基准测试中优于现有方法,在保持模型整体性能的同时,显著提升了平均表现和遗忘控制能力。

详情
英文摘要

LoRA is widely adopted for continual fine-tuning of Large Language Models due to its parameter efficiency, modularity across tasks, and compatibility with replay strategies. However, LoRA-based continual learning remains vulnerable to catastrophic forgetting, whose severity depends on how successive task gradients interact: when consecutive task gradients conflict, standard adapter initializations channel updates into subspaces that overwrite previously learned directions. We propose SLICE, a gradient-surgery-based initialization for LoRA adapters in continual learning. SLICE accumulates gradients from both the current task and a replay buffer of prior tasks, reconciles them through a projection operator, and decomposes the result via truncated SVD to initialize the adapter weights. We evaluate SLICE on the TRACE benchmark and sequences of Super-NI tasks, including a set of adversarial Super-NI sequences that we construct by mining task pairs with maximally opposing gradients. Compared to vanilla LoRA, LoRA-GA, and LoRAM, SLICE consistently achieves a better stability-plasticity trade-off, improving Average Performance, Final Performance and Forgetting metrics while preserving General Performance and In Context Performance across both standard and adversarial continual learning sequences.

2605.12748 2026-05-14 cs.CL cs.AI cs.CY cs.LG

Simulating Students or Sycophantic Problem Solving? On Misconception Faithfulness of LLM Simulators

Heejin Do, Shashank Sonkar, Mrinmaya Sachan

AI总结 该研究探讨了大语言模型(LLM)作为模拟学生的有效性,指出当前评估方法主要关注输出与真实学生的相似性,而忽视了模型是否能像学生一样保持连贯的误解并根据反馈进行选择性修正。为此,研究提出了一种新的评估框架和指标“选择性翻转分数”(SFS),用于衡量模型在面对针对性反馈时修正答案的能力。实验发现,现有模型在不同反馈条件下修正答案的频率相近,表现出“谄媚式”行为,即倾向于直接放弃原有信念而重新解答。研究进一步提出了一种后训练方法,有效提升了模型在误解一致性方面的表现。

详情
英文摘要

Large language models (LLMs) can fluently generate student-like responses, making them attractive as simulated students for training and evaluating AI tutors and human educators. Yet such simulators are typically evaluated by output similarity to real students, not by whether they behave like students with coherent misconceptions during interaction. We introduce a controlled framework for evaluating misconception faithfulness, whether a simulator maintains a misconception-driven belief state and updates selectively when feedback addresses the underlying misconception. Central to our framework is a misconception-contrastive feedback protocol that compares targeted feedback against two controls: misaligned feedback (targeting a different but plausible misconception) and generic feedback (only identifying answer is wrong). We propose Selective Flip Score (SFS), which quantifies how much more often a simulator flips its answer under targeted feedback than under contrastive controls. Across seven LLMs (4B-120B), multiple datasets, and prompting strategies, simulators exhibit near-zero SFS, correcting their answers at similarly high rates regardless of feedback relevance. Further analyses reveal a sycophantic failure mode: models behave less like students with misconceptions but more like problem-solvers who treat any corrective signal as a cue to abandon the simulated belief and re-solve from internal knowledge. To address this, we develop a post-training pipeline spanning supervised fine-tuning (SFT), preference optimization, and reinforcement learning (RL) with an SFS-aligned reward; SFT yields notable gains up to +0.56, and SFS-aligned RL provides more consistent improvements than preference optimization. Our results establish misconception faithfulness as a challenging yet trainable property, motivating a shift from static output matching toward interactive, belief-aware student modeling.

2605.12741 2026-05-14 cs.LG

Learning with Rare Success but Rich Feedback via Reflection-Enhanced Self-Distillation

Yuwei Zhang, Sha Li, Changlong Yu, Qin Lu, Shuowei Jin, Chengyu Dong, Haoran Liu, Ilgee Hong, Xintong Li, Zhenyu Shi, Bing Yin, Jingbo Shang

AI总结 本文研究了如何使大语言模型在与环境交互中持续改进,特别是在成功案例稀少的情况下。为此,提出了一种基于反思增强的自蒸馏框架(RESD),通过将失败反馈转化为积极的纠正信号,生成回顾性反思以诊断局部错误,并构建全局经验库以保留可复用的知识。实验表明,RESD在持续学习任务中显著优于传统自蒸馏方法,且在早期阶段表现出更高的交互效率。

Comments Work in progress

详情
英文摘要

Enabling Large Language Models (LLMs) to continuously improve from environmental interactions is a central challenge in post-training. While on-policy self-distillation offers a promising paradigm, existing methods predominantly treat environmental feedback as a passive conditioning signal. Consequently, they heavily rely on successful demonstrations and struggle to learn in rare-success regimes. To bridge this gap, we introduce Reflection-Enhanced Self-Distillation (RESD), a framework that transforms raw failure feedback into an active source of corrective supervision. Instead of passively appending feedback, RESD interprets failed trajectories by generating retrospective reflections to diagnose local errors, and curates a persistent global playbook to preserve reusable lessons across training steps. The enriched context enables the self-teacher to provide actionable token-level supervision even in the absence of successful rollouts. Empirical evaluations on multiple continual learning tasks demonstrate that RESD substantially outperforms standard self-distillation baselines. Furthermore, RESD achieves significantly faster early-stage improvement than GRPO with $8\times$ samples using only a single rollout per prompt, highlighting its superior interaction efficiency.

2605.12736 2026-05-14 cs.LG

ConRetroBert: EMA Stabilized Dual Encoders for Template-Based Single-Step Retrosynthesis

Mohammad Jahid Ibna Basher, Ali Khodabandeh Yalabadi, Ivan Garibay, Ozlem Ozmen Garibay

AI总结 ConRetroBert 是一种基于模板的单步逆合成方法,通过双编码器框架将模板选择问题转化为密集模板检索与候选集排序任务。该方法采用对比预训练学习产品与反应模板的共享嵌入空间,并引入多正例列表排序目标优化模板排名,同时利用指数移动平均技术稳定模板编码器更新,提升模型鲁棒性。实验表明,ConRetroBert 在 USPTO-50k 数据集上显著提升了反应预测准确率,并在稀有模板预测方面表现出色。

Comments Submitted to NeurIPS 2026 Main Conference

详情
英文摘要

Template based single step retrosynthesis predicts reactants by selecting and applying an explicit reaction template, making each prediction traceable to a chemical transformation rule. This is useful for synthesis planning, but template based methods are often viewed as less competitive than template free models because template prediction is commonly formulated as global classification over a long tailed rule library. We argue that this weakness is not inherent to templates, but to the learning formulation. We present ConRetroBert, a dual encoder framework that reframes template based retrosynthesis as dense product template retrieval followed by candidate set listwise ranking. Stage 1 uses contrastive pretraining to learn a shared embedding space between products and reaction templates. Stage 2 refines template ranking over mined hard negative candidate sets with a multi positive listwise objective. To enable template side adaptation without destabilizing hard negative mining, ConRetroBert uses a slow moving exponential moving average template encoder for retrieval bank construction while updating the live template encoder through the ranking loss. On the local USPTO-50k benchmark, Stage 2 candidate set ranking improves top-1 reaction accuracy from 50.5% to 61.3%, while EMA stabilized template adaptation further improves it to 62.4%. Fine tuning from a leakage controlled USPTO-Full checkpoint reaches 75.4% top-1 accuracy on USPTO-50k. We also show that retrieval based template prediction is strong in the long tail of rare templates, and that many correct reactant predictions arise from alternative explicit templates rather than only the recorded positive label. Code and data are available at https://github.com/JahidBasher/ConRetroBert.

2605.12735 2026-05-14 cs.RO

The Unified Autonomy Stack: Toward a Blueprint for Generalizable Robot Autonomy

Mihir Dharmadhikari, Nikhil Khedekar, Mihir Kulkarni, Morten Nissov, Martin Jacquet, Angelos Zacharia, Marvin Harms, Albert Gassol Puigjaner, Philipp Weiss, Kostas Alexis

AI总结 本文介绍了并开源了“统一自主系统栈”(Unified Autonomy Stack),这是一个面向空中和地面机器人形态的系统级解决方案,旨在实现鲁棒的通用自主性。该系统包含多模态感知、多行为规划和多层级安全导航三个协同模块,通过融合激光雷达、雷达、视觉和惯性传感器数据,实现了环境建模、语义理解、路径规划与安全导航等功能,能够在无GNSS信号、复杂和高障碍物密度的环境中实现安全自主导航与探索。该系统已在多种空中和地面机器人上进行了实地测试,验证了其在复杂环境中的稳定性能。

Comments 35 pages, 22 figures, 8 tables

详情
英文摘要

We introduce and open-source the Unified Autonomy Stack, a system-level solution that enables resilient autonomy across diverse aerial and ground robot morphologies. The architecture centers on three synergistic modules -- multi-modal perception, multi-behavior planning, and multi-layered safe navigation -- that together deliver comprehensive mission autonomy. The stack fuses data from LiDAR, radar, vision, and inertial sensing, enabling (a) robust localization and mapping through factor graph-based fusion, (b) semantic scene understanding, (c) motion and informative path planning through sampling-based techniques adaptive across spatial scales, as well as (d) multi-layered safe navigation both through planning on the online reconstructed map and deep learning-driven exteroceptive policies alongside last-resort safety filters using control barrier functions. The resulting behaviors include safe GNSS-denied navigation into unknown and perceptually-degraded regions, exploration of complex environments, object discovery, and efficient inspection planning. The stack has been field-tested and validated on both aerial (rotorcraft) and ground (legged) robots operating in a host of demanding environments, including self-similar and smoke-filled settings, with complex geometries and high obstacle clutter. These tests demonstrate resilient performance in challenging conditions. To facilitate ease of adoption, we open-source the implementation alongside supporting documentation, validation, and evaluation datasets https://github.com/ntnu-arl/unified_autonomy_stack. A video giving the overview of the paper and the field experiments is available at https://youtu.be/l8Su8OXsM-E.

2605.12733 2026-05-14 cs.LG cs.AI stat.ML

From Generalist to Specialist Representation

Yujia Zheng, Fan Feng, Yuke Li, Shaoan Xie, Kevin Murphy, Kun Zhang

AI总结 本文研究了从通用模型中学习任务相关的专家表征问题,核心在于在非参数设定下证明任务结构和任务相关潜在表征的可识别性。研究无需干预、参数形式或结构约束,证明了即使在时间序列缺乏严格时序依赖或存在断开的情况下,任务结构仍可在完全无监督条件下被识别,同时在每个时间步内,通过简单的稀疏性正则化可将任务相关与无关部分分离。这些结果为从通用模型向专家模型的可证性转变奠定了理论基础。

Comments ICML 2026

详情
英文摘要

Given a generalist model, learning a task-relevant specialist representation is fundamental for downstream applications. Identifiability, the asymptotic guarantee of recovering the ground-truth representation, is critical because it sets the ultimate limit of any model, even with infinite data and computation. We study this problem in a completely nonparametric setting, without relying on interventions, parametric forms, or structural constraints. We first prove that the structure between time steps and tasks is identifiable in a fully unsupervised manner, even when sequences lack strict temporal dependence and may exhibit disconnections, and task assignments can follow arbitrarily complex and interleaving structures. We then prove that, within each time step, the task-relevant latent representation can be disentangled from the irrelevant part under a simple sparsity regularization, without any additional information or parametric constraints. Together, these results establish a hierarchical foundation: task structure is identifiable across time steps, and task-relevant latent representations are identifiable within each step. To our knowledge, each result provides a first general nonparametric identifiability guarantee, and together they mark a step toward provably moving from generalist to specialist models.

2605.12730 2026-05-14 cs.AI cs.GR cs.MA physics.soc-ph

BEHAVE: A Hybrid AI Framework for Real-Time Modeling of Collective Human Dynamics

Helene Malyutina

AI总结 本文提出BEHAVE,一种用于实时建模群体人类动态行为的混合人工智能框架。传统AI系统多关注个体行为或事后事件检测,难以捕捉群体稳定、升级或崩溃等集体动态特性。BEHAVE将群体视为具有涌现性、非线性、反馈环和临界点敏感性的复杂动态系统,通过可观测的物理信号构建交互空间,并将其建模为连续行为场,从而实现对群体状态的分布式表征与预测。该框架结合数学定理与神经网络模型,在多个实际场景中展示了其对群体动态的有效建模与预测能力。

Comments 19 pages

详情
英文摘要

Existing AI systems for modeling human behavior operate at the level of individuals or detect events after they occur. As a result, they systematically fail to capture the collective dynamics that determine whether a group remains stable or transitions into escalation or breakdown. We propose a different foundation: a group of interacting humans constitutes a complex dynamical system in the precise mathematical sense, exhibiting emergence, nonlinearity, feedback loops, sensitivity near critical points, and phase transitions between qualitatively distinct regimes. The state of such a system is not located within any single participant; it is distributed across mutual influence loops and observable through the micro-dynamics of the body. We introduce BEHAVE (Behavioral Engine for Human Activity Vector Estimation), a formal framework that models collective dynamics as continuous behavioral fields defined over an interaction space derived from observable physical signals. Kinematic micro-signals (position, velocity, body orientation, gestural activity) are structured into a directed interaction graph and aggregated into a basis of behavioral fields capturing distinct, non-redundant axes of collective state. The framework rests on one theorem and two structural propositions characterizing the tension field, the field basis, and the criticality index. Perception and forecasting layers are implemented using neural models, enabling data-driven learning and approximation of system dynamics. BEHAVE is formulated as a computational system for learning, representing, and forecasting collective dynamics from data. A working pipeline is demonstrated on a 7-agent negotiation snapshot. The same fields, recalibrated, apply to crowd safety, crisis-team dynamics, education, and clinical contexts.

2605.12726 2026-05-14 cs.LG

Before the Last Token: Diagnosing Final-Token Safety Probe Failures

Shravan Doda

AI总结 该研究探讨了最终token安全探针在检测有害内容时的失效问题,指出某些越狱提示中的危险信息可能分布在早期token中,而未被最终token读取所捕捉。通过分析多个指令微调大语言模型中的隐藏状态,研究发现现有探针在召回干净有害提示时表现良好,但容易遗漏越狱案例并产生误报。研究进一步提出了一种基于PCA-HMM的轨迹模型,能够有效恢复被最终token探针遗漏的安全风险,为安全检测提供了新的分析思路。

Comments 8 pages, 2 figures, 7 tables

详情
英文摘要

Final-token safety probes monitor a single hidden state after prompt prefill, but jailbreak prompts can contain probe-visible unsafe evidence distributed across earlier user-token representations that is missed by this readout. We study this prefill-time failure mode using SafeSwitch-style probes trained only on clean harmful and benign prompts across three instruction-tuned LLMs. The probes achieve high recall on clean harmful prompts, but miss many jailbreaks and can produce false positives on safety-adjacent benign prompts. Subspace analyses suggest that missed jailbreaks differ from clean benign prompts along directions that are poorly captured by the probe's representational subspace, and increasing probe bottleneck width does not reliably resolve this mismatch. Token-level prefill analyses reveal that probe-visible unsafe evidence often appears earlier in the sequence but is not exposed at the final-token readout, while naive max-pooling over token positions overfires on safe prompts. A simple PCA-HMM trajectory model, trained only on the same clean split, recovers many final-token misses from user-content prefill trajectories without the catastrophic false-positive behavior of naive token pooling, motivating trajectory-aware hidden-state analyses as diagnostic complements to final-token probes

2605.12725 2026-05-14 cs.CV

Is Video Anomaly Detection Misframed? Evidence from LLM-Based and Multi-Scene Models

Furkan Mumcu, Michael J. Jones, Anoop Cherian, Yasin Yilmaz

AI总结 近年来,视频异常检测研究逐渐转向构建跨场景的通用正常行为模型,但这一趋势忽视了场景特定和上下文依赖的正常行为特性。现有方法常依赖多模态大语言模型的预训练表示和视频级弱监督,导致模型更关注语义层面的异常类别,而非特定环境中的正常行为偏差。本文通过视觉分析和实验评估指出,这种做法削弱了空间定位能力,引入语义偏差,并将异常检测简化为动作识别,强调视频异常检测应在单一场景中重新聚焦于空间感知和可解释的正常行为建模。

详情
英文摘要

Recent video anomaly detection research has expanded rapidly with an emphasis on general models of normality intended to work across many different scenes. While this focus has led to improvements in scalability and multi-scene generalization, it has also shifted the field away from modeling the scene-specific and context-dependent nature of normal behavior. Contemporary approaches frequently rely on video-level weak supervision and opaque pretrained representations from multi-modal large language models (MLLMs), which encourage models to respond to familiar semantic anomaly categories rather than to deviations from the normal patterns of a particular environment. This trend suppresses spatial localization, introduces semantic bias, and reduces anomaly detection to a form of action recognition. In this paper, we examine whether these prevailing formulations align with the core requirements of real-world VAD, which is typically performed within a single scene where normality is determined by local geometry, semantics, and activity patterns. Through targeted visual analyses and empirical evaluations, we demonstrate the practical consequences of these limitations and show that meaningful progress in VAD requires renewed focus on single-scene, spatially-aware, and explainable formulations that capture the nuanced structure of normality within individual environments.

2605.12724 2026-05-14 cs.CV cs.AI

Inline Critic Steers Image Editing

Weitai Kang, Xiaohang Zhan, Yizhou Wang, Mang Tik Chiu, Jason Kuen, Kangning Liu, Yan Yan

AI总结 本文研究了基于指令的图像编辑中不同区域的难度差异问题,提出了一种在生成过程中实时修正模型输出的方法。核心方法是引入一个可学习的“Inline Critic”模块,在模型中间层对生成结果进行评估,并引导后续生成过程。该方法通过三阶段训练策略稳定模型学习,显著提升了图像编辑的效果,在多个基准测试中取得了当前最优性能。

Comments 9 pages

详情
英文摘要

Instruction-based image editing exhibits heterogeneous difficulty not only across cases but also across regions of an image, motivating refinement approaches that allocate correction to where the model struggles. Existing refinement signals arrive late, after a fully generated image or a completed denoising step. We ask whether such a signal can act within an ongoing forward pass. To investigate this, we probe a frozen image-editing model and find that although generation capability emerges only in the last few layers, the error pattern is already set in early layers (rank correlation \r{ho} = 0.83 with the final-layer error map). Based on this, we introduce Inline Critic, a learnable token that critiques a frozen model's predictions at its intermediate layers and steers its hidden states to refine generation during the forward pass. A three-stage recipe is proposed to stabilize the training from learning how to critique to steering generation. As a result, we achieve state of the art on GEdit-Bench (7.89), a +9.4 gain on RISEBench over the same backbone, and the strongest open-source result on KRIS-Bench (81.92, surpassing GPT-4o). We further provide analyses showing that the critic genuinely shapes the model's attention and prediction updates at subsequent layers.

2605.12719 2026-05-14 cs.RO cs.LG

A Five-Layer MLOps Architecture for Connected Automated Driving

Bastian Lampe, Lutz Eckstein

AI总结 自动驾驶系统(ADS)在复杂、动态的开放环境中运行,其安全性和性能的持续保障面临重大挑战。本文提出了一种基于MLOps原理的五层架构,旨在支持自动驾驶系统通过车队协同学习实现持续改进。该架构为车队运营商及相关利益方提供了设计和实施MLOps流程的概念蓝图,通过多层级的自我评估机制,有助于检测和减少包括黑天鹅事件在内的边缘案例。

Comments 8 pages, 6 figures

详情
英文摘要

The continual assurance of safety and performance of automated driving systems (ADSs) poses significant challenges. ADSs operate in complex, dynamic, open-world environments allowing a wide range of scenarios, including ones that are rare or not foreseen during initial development. While the incorporation of artificial intelligence (AI) and machine learning (ML) technology allows ADSs to learn from data gathered during operation and thus enables them to adapt over time, these approaches come with their own challenges. A key advantage of ADSs compared to human drivers is their greater ability to gather data collectively across a fleet of vehicles, or even across multiple fleets operated by different entities, and to learn from this data collectively. Vehicles can share and combine their data to identify additional learning opportunities otherwise missed by individual vehicles. This creates new opportunities to tackle the challenges of continual assurance of safety and performance, but requires the implementation of architectures that leverage the collective learning potential. Based on established MLOps principles and existing work in the field of connected automated driving, this paper presents a five-layer architecture for collective learning-enabled MLOps processes for ADSs. The goal of this architecture is to provide a conceptual blueprint for the design and implementation of MLOps processes by fleet operators and other relevant stakeholders. The paper describes the main responsibilities of each layer, their interactions, and how multi-level self-assessments enabled by the architecture can support the detection and reduction of edge cases including black swan events.

2605.12718 2026-05-14 cs.AI cs.LG cs.MA

CHAL: Council of Hierarchical Agentic Language

Tommaso Giovannelli, Griffin D. Kent

AI总结 本文提出了一种名为CHAL的多智能体辩论框架,旨在通过可反驳的论证优化信念系统,解决当前多智能体辩论在结构上的局限性。CHAL引入了基于图结构的信念表示和梯度引导的动态更新机制,并将元认知价值系统作为可配置参数,以指导智能体的推理与裁决过程。该框架在多个领域展示了良好的泛化能力,并为构建透明、可审计的AI系统提供了基础。

详情
英文摘要

Multi-agent debate has emerged as a promising approach for improving LLM reasoning on ground-truth tasks, yet current methodologies face certain structural limitations: debate tends to induce a martingale over belief trajectories, majority voting accounts for most observed gains, and LLMs exhibit confidence escalation rather than calibration across rounds. We argue that the genuine value of debate, and dialectic systems as a whole, lies not in ground-truth tasks but in defeasible domains, where every position can in principle be defeated by better reasoning. We present the Council of Hierarchical Agentic Language (CHAL), a multi-agent dialectic framework that treats defeasible argumentation as an engine for belief optimization. Each agent maintains a CHAL Belief Schema (CBS), a graph-structured belief representation with a Bayesian-inspired architecture, that facilitates belief revision through a gradient-informed dynamic mechanism by leveraging the strength of the belief's thesis as a differentiable objective. Meta-cognitive value systems spanning epistemology, logic, and ethics are elevated to configurable hyperparameters governing agent reasoning and adjudication outcomes. We provide a series of ablation experiments that demonstrate systematic and interpretable effects: the adjudicator's value system determines the debate's overall trajectories in latent belief space, council diversity refines beliefs for all participants, and the framework generalizes across broad fields. CHAL is, to our knowledge, the first framework to treat multi-agent debate as structured belief optimization over defeasible domains. Further, the auditable belief artifacts it produces establish the foundation for dedicated evaluation suites for defeasible argumentation, with broader implications for building AI systems whose reasoning and value commitments are transparent, aligned, and subject to human oversight.

2605.12714 2026-05-14 cs.LG cs.CL

Layer-wise Representation Dynamics: An Empirical Investigation Across Embedders and Base LLMs

Jingzhou Jiang, Yi Yang, Kar Yan Tam

AI总结 该研究提出了一种名为Layer-wise Representation Dynamics(LRD)的框架,用于分析现代语言模型各层表示的变化特性,包含三个测量指标:用于全局子空间运动的Frenet、用于局部近邻保留的Neighborhood Retention Score(NRS)以及用于对齐最终层的Graph Filtration Mutual Information(GFMI)。通过在31种模型和30个MTEB任务上的实验,揭示了不同架构和任务在层间表示上的差异,并展示了LRD在无标签模型选择和推理时层剪枝中的应用价值,表明层间结构信息对模型解释和部署决策具有重要意义。

详情
英文摘要

Hidden states change substantially across the layers of modern language models, but most layer-wise analyses focus on one aspect of that change. We propose Layer-wise Representation Dynamics (LRD), a framework with three layer-wise measurement families: Frenet (Grassmann speed and curvature) for global subspace motion, Neighborhood Retention Score (NRS) for local nearest-neighbor retention, and Graph Filtration Mutual Information (GFMI) for alignment with the final layer. Applying LRD to 31 models (encoder-based and decoder-based embedders, plus base LLMs) on 30 MTEB tasks reveals architectural and task-level differences that are not apparent from final-layer representations alone. We then use LRD for two applications: label-free model selection and inference-time layer pruning. For selection, all three model-level scores correlate positively with downstream MTEB performance, with end-to-end subspace displacement (d_{0,L}) the strongest, and the same direction holds on a smaller base-LLM MMLU panel. For pruning, GFMI is the only measurement-guided rule that beats Random at the 15% and 20% budgets and has the best median change at every budget. Frenet is effective only at the lightest budget, while NRS does not transfer from model selection to pruning. These results show that layer-wise structure provides signal for both interpretation and deployment decisions.

2605.12710 2026-05-14 cs.RO

Belief-Space Residual Risk for Automated Driving under Localization Uncertainty

Nijinshan Karunainayagam, Nils Gehrke, Frank Diermeyer

AI总结 本文研究了在定位不确定性条件下自动驾驶系统的残余风险评估问题。为准确反映车辆自身位置的不确定性,作者将残余风险度量扩展到信念空间,将自身姿态不确定性建模为高斯分布,并重新定义残余风险为该分布下风险退化期望值。通过粒子滤波框架下的协方差融合方法,将定位不确定性纳入碰撞概率计算,提升了风险评估的鲁棒性。

Comments 7 Pages, this work has been accepted for publication in IEEE Intelligent Transportation Systems (ITSC) 2026. The final published version will be available via IEEE Xplore

详情
英文摘要

Residual risk metrics have recently been introduced to assess the safety implications of automated driving systems. Existing approaches typically assume a deterministic ego pose and concentrate mainly on perception errors related to surrounding objects and latency effects. In practice, however, automated vehicles operate under considerable localization uncertainty, especially in complex urban settings and in adverse weather conditions. This work extends the spatial residual risk formulation to the belief space by explicitly modeling ego pose uncertainty as a Gaussian distribution. Residual risk is reformulated as the expected degradation-induced risk over the ego pose belief distribution. Within a particle-based risk estimation framework, localization uncertainty is incorporated into the computation of collision probabilities through covariance fusion of ego and object uncertainties.

2605.12709 2026-05-14 cs.LG

Spectral Energy Centroid: a Metric for Improving Performance and Analyzing Spectral Bias in Implicit Neural Representations

Tomasz Dądela, Adam Kania, Maciej Rut, Przemysław Spurek

AI总结 本文提出了一种名为光谱能量质心(SEC)的度量方法,用于分析和提升隐式神经表示(INRs)的性能。SEC能够量化目标图像的频率特性以及INR模型的频谱偏差,揭示了频率与INR性能之间的关系。研究展示了SEC在三个任务中的有效性,包括超参数选择、信号复杂度评估以及跨不同架构的频谱偏差对齐,为理解与优化INR提供了新的分析工具。

详情
英文摘要

Implicit Neural Representations (INRs) model continuous signals using multilayer perceptrons (MLPs), enabling compact, differentiable, and high-fidelity representations of data across diverse domains. However, due to the low-frequency bias of MLPs that prevents effective learning of small details, the model's frequency must be carefully tuned through the embedding layer. Prior work established that this tuning can be performed before training based on the target signal, but it did not account for the significant effect of model depth, indicating that our understanding of the relationship between frequency and INR performance remains limited. To gain insights into this relationship, we utilize the Spectral Energy Centroid (SEC) metric that quantifies the frequency of target images and the spectral bias of INR models. We show that SEC is a versatile tool for INR analysis, demonstrating its utility across three tasks: (1) a data-driven strategy (SEC-Conf) for hyperparameter selection that outperforms existing heuristics and is robust to model depth, (2) a reliable proxy for signal complexity, and (3) effective alignment of spectral biases across diverse INR architectures.

2605.12706 2026-05-14 cs.LG q-bio.GN

A Resampling-Based Framework for Network Structure Learning in High-Dimensional Data

Ziwei Huang, Zeyuan Song, Paola Sebastiani, Stefano Monti

AI总结 RSNet 是一个开源的 R 软件包,提供了一种基于重采样的框架,用于在高维数据中进行稳健且可解释的网络结构学习,旨在解决小样本量带来的挑战。该框架支持连续和离散混合数据类型的条件高斯贝叶斯网络及部分相关网络的估计,并结合多种重采样策略以适应独立或相关观测。RSNet 通过引入基于图元的拓扑分析,增强了网络结构的可解释性,并首次实现了在稀疏网络中高效构建带符号的图元度向量矩阵,从而支持对高阶网络结构的可扩展分析。

Comments 7 pages, 1 figure

详情
英文摘要

RSNet is an open-source R package that provides a resampling-based framework for robust and interpretable network inference, designed to address the limited-sample-size challenges common in high-dimensional data. It supports both the estimation of partial correlation networks modeled as Gaussian networks and conditional Gaussian Bayesian networks for mixed data types that combine continuous and discrete variables. The framework incorporates multiple resampling strategies, including bootstrap, subsampling, and cluster-based approaches, to accommodate both independent and correlated observations. To enhance interpretability, RSNet integrates graphlet-based topology analysis that captures higher-order connectivity and edge sign information, enabling single-node and subnetwork-level insights. Notably, RSNet is the first R package to efficiently construct signed graphlet degree vector matrices (GDVMs) in near-constant time for sparse networks, providing scalable analysis of higher-order network structure. Collectively, RSNet offers a versatile tool for statistically reliable and interpretable network inference in high-dimensional data.

2605.12705 2026-05-14 cs.LG

Early Data Exposure Improves Robustness to Subsequent Fine-Tuning

Lawrence Feng, Gaurav R. Ghosal, Jacob Mitchell Springer, Ziqian Zhong, Aditi Raghunathan

AI总结 本文研究了如何训练模型,使其在后续微调过程中仍能保持已习得的能力。通过控制实验,作者发现早期数据暴露(将微调数据混合到预训练阶段)能有效提升模型对后续微调的鲁棒性,优于传统的微调阶段应对遗忘的方法。实验表明,合理分配数据到预训练和微调阶段,能够更有效地平衡模型的初始能力和后续适应能力,为模型训练提供了新的策略方向。

详情
英文摘要

How can we train models whose post-trained capabilities survive subsequent fine-tuning? Rather than focusing on downstream interventions to mitigate forgetting of upstream capabilities, we study how upstream training choices - that is, the manner in which a capability is acquired - shape how robustly that capability is retained. We investigate this question in a controlled three-stage language-model pipeline: pretraining, post-training to acquire a target capability, and downstream fine-tuning on a new objective. Across 135M and 1B models, two post-training domains, and two downstream fine-tuning tasks, we find that immediate post-training performance does not reliably predict retention after subsequent fine-tuning: training recipes that look equivalent immediately after post-training can retain the target capability very differently after subsequent fine-tuning. In particular, early exposure - mixing post-training data into pretraining - consistently improves the frontier between retained upstream performance and downstream performance. In compute-matched experiments, where the target data must be allocated between pretraining and post-training, we find that the optimum lies at neither extreme. Together with our other empirical and theoretical findings, this supports the view that post-training drives immediate specialization while early exposure improves robustness to later forgetting. Replay and dropout, typically used to mitigate forgetting as it occurs during fine-tuning, provide complementary gains to early exposure when applied during post-training. Our findings suggest that robustness to subsequent fine-tuning should be treated as a first-class objective of upstream training, addressed preventatively through choices like early exposure rather than reactively during fine-tuning itself.

2605.12703 2026-05-14 cs.CV cs.AI

MMCL-Bench: Multimodal Context Learning from Visual Rules, Procedures, and Evidence

Yifan Chen, Fei Yin, Qingyan Bai, Zicheng Lin, Yujiu Yang

AI总结 本文介绍了 MMCL-Bench,一个用于多模态上下文学习的基准,旨在从视觉或混合模态的教学内容中学习任务相关的规则、程序和经验模式,并应用于新的视觉实例。该基准包含102个任务,涵盖规则应用、流程执行和经验归纳三个类别,评估结果显示当前主流多模态模型在严格评分标准下仍远未达到鲁棒的多模态上下文学习能力,揭示了多模态上下文学习作为当前模型的重要能力瓶颈。

详情
英文摘要

We introduce MMCL-Bench, a benchmark for multimodal context learning: learning task-local rules, procedures, and empirical patterns from visual or mixed-modality teaching context and applying them to new visual instances. Unlike text-only context learning or standard multimodal question answering, this setting requires models to recover and localize relevant evidence from images, screenshots, manuals, videos, and frame sequences before they can reason over the learned context. MMCL-Bench contains 102 tasks spanning three categories: rule system application, procedural task execution, and empirical discovery and induction. We evaluate frontier multimodal models with strict rubric-based scoring and find that current systems remain far from robust multimodal context learning, with even the strongest model solving fewer than one-third of tasks under strict evaluation. Diagnostic ablations and error analysis show that failures arise throughout the context-to-answer pipeline, including context anchoring, visual evidence extraction, context reasoning, and response construction. MMCL-Bench thus highlights multimodal context learning as an important unsolved capability bottleneck for current multimodal models.