arXivDaily arXiv每日学术速递 周一至周五更新
重置
2604.19747 2026-04-22 cs.CV

AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Model

Yutian Chen, Shi Guo, Renbiao Jin, Tianshuo Yang, Xin Cai, Yawen Luo, Mingxin Yang, Mulin Yu, Linning Xu, Tianfan Xue

Comments Webpage: https://yutian10.github.io/AnyRecon/

详情
英文摘要

Sparse-view 3D reconstruction is essential for modeling scenes from casual captures, but remain challenging for non-generative reconstruction. Existing diffusion-based approaches mitigates this issues by synthesizing novel views, but they often condition on only one or two capture frames, which restricts geometric consistency and limits scalability to large or diverse scenes. We propose AnyRecon, a scalable framework for reconstruction from arbitrary and unordered sparse inputs that preserves explicit geometric control while supporting flexible conditioning cardinality. To support long-range conditioning, our method constructs a persistent global scene memory via a prepended capture view cache, and removes temporal compression to maintain frame-level correspondence under large viewpoint changes. Beyond better generative model, we also find that the interplay between generation and reconstruction is crucial for large-scale 3D scenes. Thus, we introduce a geometry-aware conditioning strategy that couples generation and reconstruction through an explicit 3D geometric memory and geometry-driven capture-view retrieval. To ensure efficiency, we combine 4-step diffusion distillation with context-window sparse attention to reduce quadratic complexity. Extensive experiments demonstrate robust and scalable reconstruction across irregular inputs, large viewpoint gaps, and long trajectories.

2604.19742 2026-04-22 cs.SE

PlayCoder: Making LLM-Generated GUI Code Playable

Zhiyuan Peng, Wei Tao, Xin Yin, Chenhao Ying, Yuan Luo, Yiwen Guo

Comments September 11, 2025 Submitted to FSE2026

详情
英文摘要

Large language models (LLMs) have achieved strong results in code generation, but their ability to generate GUI applications, especially games, remains insufficiently studied. Existing benchmarks mainly evaluate correctness through test cases, which are inadequate for GUI applications because these systems are interactive, event-driven, and require correct state transitions across sequences of user actions. Their evaluation therefore should consider interaction flows and UI logic rather than only pass/fail outcomes. To study this problem, we introduce PlayEval, a repository-aware benchmark built from 43 multilingual GUI applications in Python, TypeScript, and JavaScript. Unlike prior GUI benchmarks that are difficult to adapt to desktop environments, PlayEval covers six major GUI application categories and directly supports code-generation evaluation. We further propose Play@k, a metric that measures whether at least one of *k* generated candidates can be played end-to-end without logical errors. To support reliable evaluation, we develop PlayTester, an LLM-based agent that performs task-oriented GUI playthroughs and detects logic violations automatically. Experiments on 10 state-of-the-art code LLMs show that, despite high compilation rates, they achieve near-zero Play@3, revealing major weaknesses in generating logically correct GUI applications. To address this limitation, we present PlayCoder, a multi-agent, repository-aware framework that generates, evaluates, and iteratively repairs GUI application code in a closed loop. PlayCoder substantially improves both functional correctness and semantic alignment for open-source and closed-source models, reaching up to 38.1% Exec@3 and 20.3% Play@3. Case studies further show that it can uncover silent logic bugs missed by traditional metrics and fix them through targeted edits.

2604.19740 2026-04-22 cs.LG cs.AI cs.CV stat.ML

Generalization at the Edge of Stability

Mario Tuci, Caner Korkmaz, Umut Şimşekli, Tolga Birdal

Comments Project page: https://circle-group.github.io/research/GATES

详情
英文摘要

Training modern neural networks often relies on large learning rates, operating at the edge of stability, where the optimization dynamics exhibit oscillatory and chaotic behavior. Empirically, this regime often yields improved generalization performance, yet the underlying mechanism remains poorly understood. In this work, we represent stochastic optimizers as random dynamical systems, which often converge to a fractal attractor set (rather than a point) with a smaller intrinsic dimension. Building on this connection and inspired by Lyapunov dimension theory, we introduce a novel notion of dimension, coined the `sharpness dimension', and prove a generalization bound based on this dimension. Our results show that generalization in the chaotic regime depends on the complete Hessian spectrum and the structure of its partial determinants, highlighting a complexity that cannot be captured by the trace or spectral norm considered in prior work. Experiments across various MLPs and transformers validate our theory while also providing new insights into the recently observed phenomenon of grokking.

2604.19737 2026-04-22 cs.LG

Safe Continual Reinforcement Learning in Non-stationary Environments

Austin Coursey, Abel Diaz-Gonzalez, Marcos Quinones-Grueiro, Gautam Biswas

详情
英文摘要

Reinforcement learning (RL) offers a compelling data-driven paradigm for synthesizing controllers for complex systems when accurate physical models are unavailable; however, most existing control-oriented RL methods assume stationarity and, therefore, struggle in real-world non-stationary deployments where system dynamics and operating conditions can change unexpectedly. Moreover, RL controllers acting in physical environments must satisfy safety constraints throughout their learning and execution phases, rendering transient violations during adaptation unacceptable. Although continual RL and safe RL have each addressed non-stationarity and safety, respectively, their intersection remains comparatively unexplored, motivating the study of safe continual RL algorithms that can adapt over the system's lifetime while preserving safety. In this work, we systematically investigate safe continual reinforcement learning by introducing three benchmark environments that capture safety-critical continual adaptation and by evaluating representative approaches from safe RL, continual RL, and their combinations. Our empirical results reveal a fundamental tension between maintaining safety constraints and preventing catastrophic forgetting under non-stationary dynamics, with existing methods generally failing to achieve both objectives simultaneously. To address this shortcoming, we examine regularization-based strategies that partially mitigate this trade-off and characterize their benefits and limitations. Finally, we outline key open challenges and research directions toward developing safe, resilient learning-based controllers capable of sustained autonomous operation in changing environments.

2604.19736 2026-04-22 cs.CV

Generative Drifting for Conditional Medical Image Generation

Zirong Li, Siyuan Mei, Weiwen Wu, Andreas Maier, Lina Gölz, Yan Xia

详情
英文摘要

Conditional medical image generation plays an important role in many clinically relevant imaging tasks. However, existing methods still face a fundamental challenge in balancing inference efficiency, patient-specific fidelity, and distribution-level plausibility, particularly in high-dimensional 3D medical imaging. In this work, we propose GDM, a generative drifting framework that reformulates deterministic medical image prediction as a multi-objective learning problem to jointly promote distribution-level plausibility and patient-specific fidelity while retaining one-step inference. GDM extends drifting to 3D medical imaging through an attractive-repulsive drift that minimizes the discrepancy between the generator pushforward and the target distribution. To enable stable drifting-based learning in 3D volumetric data, GDM constructs a multi-level feature bank from a medical foundation encoder to support reliable affinity estimation and drifting field computation across complementary global, local, and spatial representations. In addition, a gradient coordination strategy in the shared output space improves optimization balance under competing distribution-level and fidelity-oriented objectives. We evaluate the proposed framework on two representative tasks, MRI-to-CT synthesis and sparse-view CT reconstruction. Experimental results show that GDM consistently outperforms a wide range of baselines, including GAN-based, flow-matching-based, and SDE-based generative models, as well as supervised regression methods, while improving the balance among anatomical fidelity, quantitative reliability, perceptual realism, and inference efficiency. These findings suggest that GDM provides a practical and effective framework for conditional 3D medical image generation.

2604.19734 2026-04-22 cs.RO cs.AI

UniT: Toward a Unified Physical Language for Human-to-Humanoid Policy Learning and World Modeling

Boyu Chen, Yi Chen, Lu Qiu, Jerry Bai, Yuying Ge, Yixiao Ge

Comments Project page: https://xpeng-robotics.github.io/unit/

详情
英文摘要

Scaling humanoid foundation models is bottlenecked by the scarcity of robotic data. While massive egocentric human data offers a scalable alternative, bridging the cross-embodiment chasm remains a fundamental challenge due to kinematic mismatches. We introduce UniT (Unified Latent Action Tokenizer via Visual Anchoring), a framework that establishes a unified physical language for human-to-humanoid transfer. Grounded in the philosophy that heterogeneous kinematics share universal visual consequences, UniT employs a tri-branch cross-reconstruction mechanism: actions predict vision to anchor kinematics to physical outcomes, while vision reconstructs actions to filter out irrelevant visual confounders. Concurrently, a fusion branch synergies these purified modalities into a shared discrete latent space of embodiment-agnostic physical intents. We validate UniT across two paradigms: 1) Policy Learning (VLA-UniT): By predicting these unified tokens, it effectively leverages diverse human data to achieve state-of-the-art data efficiency and robust out-of-distribution (OOD) generalization on both humanoid simulation benchmark and real-world deployments, notably demonstrating zero-shot task transfer. 2) World Modeling (WM-UniT): By aligning cross-embodiment dynamics via unified tokens as conditions, it realizes direct human-to-humanoid action transfer. This alignment ensures that human data seamlessly translates into enhanced action controllability for humanoid video generation. Ultimately, by inducing a highly aligned cross-embodiment representation (empirically verified by t-SNE visualizations revealing the convergence of human and humanoid features into a shared manifold), UniT offers a scalable path to distill vast human knowledge into general-purpose humanoid capabilities.

2604.19733 2026-04-22 math.CO cs.DS cs.NI cs.SI

Greedy Routing in a Sequentially Grown One-Dimensional Random Graph

Alexander Ponomarenko

详情
英文摘要

We analyze greedy routing in a random graph G_n constructed on the vertex set V = {1, 2, ..., n} embedded in Z. Vertices are inserted according to a uniform random permutation pi, and each newly inserted vertex connects to its nearest already-inserted neighbors on the left and right (if they exist). This work addresses a conjecture originating from empirical studies (Ponomarenko et al., 2011; Malkov et al., 2012), which observed through simulations that greedy search in sequentially grown graphs exhibits logarithmic routing complexity across various dimensions. While the original claim was based on experiments and geometric intuition, a rigorous mathematical foundation remained open. Here, we formalize and resolve this conjecture for the one-dimensional case. For a greedy walk GW starting at vertex 1 targeting vertex n -- which at each step moves to the neighbor closest to n -- we prove that the number of steps S_n required to reach n satisfies S_n = Theta(log n) with high probability. Precisely, S_n = L_n + R_n - 2, where L_n and R_n are the numbers of left-to-right and right-to-left minima in the insertion-time permutation. Consequently, E[S_n] = 2H_n - 2 ~ 2 log n and P(S_n >= (2+c) log n) <= n^(-h(c/2) + o(1)) for any constant c > 0, with an analogous lower tail bound for 0 < c < 2, where h(u) = (1+u) ln(1+u) - u is the Bennett rate function. Furthermore, we establish that this logarithmic scaling is robust: for arbitrary or uniformly random start-target pairs, the expected routing complexity remains E[S_{s,t}] = 2 log n + O(1), closely mirroring decentralized routing scenarios in real-world networks where endpoints are chosen dynamically rather than fixed a priori.

2604.19730 2026-04-22 cs.LG cs.AI

FASTER: Value-Guided Sampling for Fast RL

Perry Dong, Alexander Swerdlow, Dorsa Sadigh, Chelsea Finn

详情
英文摘要

Some of the most performant reinforcement learning algorithms today can be prohibitively expensive as they use test-time scaling methods such as sampling multiple action candidates and selecting the best one. In this work, we propose FASTER, a method for getting the benefits of sampling-based test-time scaling of diffusion-based policies without the computational cost by tracing the performance gain of action samples back to earlier in the denoising process. Our key insight is that we can model the denoising of multiple action candidates and selecting the best one as a Markov Decision Process (MDP) where the goal is to progressively filter action candidates before denoising is complete. With this MDP, we can learn a policy and value function in the denoising space that predicts the downstream value of action candidates in the denoising process and filters them while maximizing returns. The result is a method that is lightweight and can be plugged into existing generative RL algorithms. Across challenging long-horizon manipulation tasks in online and batch-online RL, FASTER consistently improves the underlying policies and achieves the best overall performance among the compared methods. Applied to a pretrained VLA, FASTER achieves the same performance while substantially reducing training and inference compute requirements. Code is available at https://github.com/alexanderswerdlow/faster .

2604.19729 2026-04-22 cs.LG cs.IT eess.SP math.IT

FB-NLL: A Feature-Based Approach to Tackle Noisy Labels in Personalized Federated Learning

Abdulmoneam Ali, Ahmed Arafa

Comments Submitted for journal publication

详情
英文摘要

Personalized Federated Learning (PFL) aims to learn multiple task-specific models rather than a single global model across heterogeneous data distributions. Existing PFL approaches typically rely on iterative optimization-such as model update trajectories-to cluster users that need to accomplish the same tasks together. However, these learning-dynamics-based methods are inherently vulnerable to low-quality data and noisy labels, as corrupted updates distort clustering decisions and degrade personalization performance. To tackle this, we propose FB-NLL, a feature-centric framework that decouples user clustering from iterative training dynamics. By exploiting the intrinsic heterogeneity of local feature spaces, FB-NLL characterizes each user through the spectral structure of the covariances of their feature representations and leverages subspace similarity to identify task-consistent user groupings. This geometry-aware clustering is label-agnostic and is performed in a one-shot manner prior to training, significantly reducing communication overhead and computational costs compared to iterative baselines. Complementing this, we introduce a feature-consistency-based detection and correction strategy to address noisy labels within clusters. By leveraging directional alignment in the learned feature space and assigning labels based on class-specific feature subspaces, our method mitigates corrupted supervision without requiring estimation of stochastic noise transition matrices. In addition, FB-NLL is model-independent and integrates seamlessly with existing noise-robust training techniques. Extensive experiments across diverse datasets and noise regimes demonstrate that our framework consistently outperforms state-of-the-art baselines in terms of average accuracy and performance stability.

2604.19728 2026-04-22 cs.RO cs.AI cs.CV cs.LG cs.SE

VLA Foundry: A Unified Framework for Training Vision-Language-Action Models

Jean Mercat, Sedrick Keh, Kushal Arora, Isabella Huang, Paarth Shah, Haruki Nishimura, Shun Iwase, Katherine Liu

Comments 32 pages, 16 figures, technical report

详情
英文摘要

We present VLA Foundry, an open-source framework that unifies LLM, VLM, and VLA training in a single codebase. Most open-source VLA efforts specialize on the action training stage, often stitching together incompatible pretraining pipelines. VLA Foundry instead provides a shared training stack with end-to-end control, from language pretraining to action-expert fine-tuning. VLA Foundry supports both from-scratch training and pretrained backbones from Hugging Face. To demonstrate the utility of our framework, we train and release two types of models: the first trained fully from scratch through our LLM-->VLM-->VLA pipeline and the second built on the pretrained Qwen3-VL backbone. We evaluate closed-loop policy performance of both models on LBM Eval, an open-data, open-source simulator. We also contribute usability improvements to the simulator and the STEP analysis tools for easier public use. In the nominal evaluation setting, our fully-open from-scratch model is on par with our prior closed-source work and substituting in the Qwen3-VL backbone leads to a strong multi-task table top manipulation policy outperforming our baseline by a wide margin. The VLA Foundry codebase is available at https://github.com/TRI-ML/vla_foundry and all multi-task model weights are released on https://huggingface.co/collections/TRI-ML/vla-foundry. Additional qualitative videos are available on the project website https://tri-ml.github.io/vla_foundry.

2604.19724 2026-04-22 cs.LG cs.AI

Benign Overfitting in Adversarial Training for Vision Transformers

Jiaming Zhang, Meng Ding, Shaopeng Fu, Jingfeng Zhang, Di Wang

Comments arXiv admin note: text overlap with arXiv:2409.19345 by other authors

详情
英文摘要

Despite the remarkable success of Vision Transformers (ViTs) across a wide range of vision tasks, recent studies have revealed that they remain vulnerable to adversarial examples, much like Convolutional Neural Networks (CNNs). A common empirical defense strategy is adversarial training, yet the theoretical underpinnings of its robustness in ViTs remain largely unexplored. In this work, we present the first theoretical analysis of adversarial training under simplified ViT architectures. We show that, when trained under a signal-to-noise ratio that satisfies a certain condition and within a moderate perturbation budget, adversarial training enables ViTs to achieve nearly zero robust training loss and robust generalization error under certain regimes. Remarkably, this leads to strong generalization even in the presence of overfitting, a phenomenon known as \emph{benign overfitting}, previously only observed in CNNs (with adversarial training). Experiments on both synthetic and real-world datasets further validate our theoretical findings.

2604.19722 2026-04-22 cs.LG cs.AI

Adaptive MSD-Splitting: Enhancing C4.5 and Random Forests for Skewed Continuous Attributes

Jake Lee

详情
英文摘要

The discretization of continuous numerical attributes remains a persistent computational bottleneck in the induction of decision trees, particularly as dataset dimensions scale. Building upon the recently proposed MSD-Splitting technique -- which bins continuous data using the empirical mean and standard deviation to dramatically improve the efficiency and accuracy of the C4.5 algorithm -- we introduce Adaptive MSD-Splitting (AMSD). While standard MSD-Splitting is highly effective for approximately symmetric distributions, its rigid adherence to fixed one-standard-deviation cutoffs can lead to catastrophic information loss in highly skewed data, a common artifact in real-world biomedical and financial datasets. AMSD addresses this by dynamically adjusting the standard deviation multiplier based on feature skewness, narrowing intervals in dense regions to preserve discriminative resolution. Furthermore, we integrate AMSD into ensemble methods, specifically presenting the Random Forest-AMSD (RF-AMSD) framework. Empirical evaluations on the Census Income, Heart Disease, Breast Cancer, and Forest Covertype datasets demonstrate that AMSD yields a 2-4% accuracy improvement over standard MSD-Splitting, while maintaining near-identical O(N) time complexity reductions compared to the O(N log N) exhaustive search. Our Random Forest extension achieves state-of-the-art accuracy at a fraction of standard computational costs, confirming the viability of adaptive statistical binning in large-scale ensemble learning architectures.

2604.19720 2026-04-22 cs.CV

ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesis

Zhengwentai Sun, Keru Zheng, Chenghong Li, Hongjie Liao, Xihe Yang, Heyuan Li, Yihao Zhi, Shuliang Ning, Shuguang Cui, Xiaoguang Han

详情
英文摘要

Human video generation remains challenging due to the difficulty of jointly modeling human appearance, motion, and camera viewpoint under limited multi-view data. Existing methods often address these factors separately, resulting in limited controllability or reduced visual quality. We revisit this problem from an image-first perspective, where high-quality human appearance is learned via image generation and used as a prior for video synthesis, decoupling appearance modeling from temporal consistency. We propose a pose- and viewpoint-controllable pipeline that combines a pretrained image backbone with SMPL-X-based motion guidance, together with a training-free temporal refinement stage based on a pretrained video diffusion model. Our method produces high-quality, temporally consistent videos under diverse poses and viewpoints. We also release a canonical human dataset and an auxiliary model for compositional human image synthesis. Code and data are publicly available at https://github.com/Taited/ReImagine.

2604.19719 2026-04-22 cs.FL

On Languages Describing Large Graph Classes

Henning Fernau, Pamela Fleischmann, Kevin Mann, Silas Cato Sacher

Comments arXiv admin note: substantial text overlap with arXiv:2411.03274

详情
英文摘要

In this work, we introduce a new notion for representing graph classes with formal languages. In contrast to the seminal work by Kitaev and Pyatkin to represent graphs by words, we use formal binary languages in order to have a set of patterns (given by the languages' words) defining the edges in the graph. In particular, we investigate famous languages like the palindromes, copy-words, Lyndon words, and Dyck words to represent all graphs or specific graph classes by restricting these languages.

2604.19717 2026-04-22 quant-ph cs.CC

Qubit Routing for (Almost) Free

Arianne Meijer-van de Griend

Comments 14 pages, rough draft

详情
英文摘要

In this paper, we give a mathematical proof that bounds the number of CNOT gates required to synthesize an $n$ qubit phase polynomial with $g$ terms to be at least $O(\frac{gn}{\max (\log g, 1)})$ and at most $O(gn)$. However, when targeting restricted hardware, not all CNOTs are allowed. If we were to use SWAP-based methods to route the qubits on the architecture such that the earlier synthesized gates are natively allowed, we increase the number of CNOTs by a routing overhead factor of $O(\log n) \leq α\leq O(n \log^2 n)$. However, if we only synthesize allowed gates, we do not need to route any qubits. Moreover, in that case the routing overhead factor is $1 \leq α\leq 4 \simeq O(1)$. Additionally, since phase polynomials and Hadamard gates together form a universal gate set, we get qubit routing for almost free.

2604.19716 2026-04-22 cs.CL

Discovering a Shared Logical Subspace: Steering LLM Logical Reasoning via Alignment of Natural-Language and Symbolic Views

Feihao Fang, My T. Thai, Yuanyuan Lei

Comments Accepted to ACL 2026

详情
英文摘要

Large Language Models (LLMs) still struggle with multi-step logical reasoning. Existing approaches either purely refine the reasoning chain in natural language form or attach a symbolic solver as an external module. In this work, we instead ask whether LLMs contain a shared internal logical subspace that simultaneously aligns natural-language and symbolic-language views of the reasoning process. Our hypothesis is that this logical subspace captures logical reasoning capabilities in LLMs that are shared across views while remaining independent of surface forms. To verify this, we employ Canonical Correlation Analysis on the paired residual activations from natural-language and symbolic-language reasoning chains, learning a low-dimensional subspace with maximum cross-view correlation. Furthermore, we design a training-free approach that steers LLMs reasoning chain along this logical subspace, thereby leveraging the complementary reasoning signals from both views. Experiments on four logical reasoning benchmarks demonstrate the effectiveness of our approach, improving accuracy by up to 11 percentage points and generalizing well on out-of-domain problems.

2604.19715 2026-04-22 cs.CV cs.SY eess.SY

A Network-Aware Evaluation of Distributed Energy Resource Control in Smart Distribution Systems

Houchao Gan

详情
英文摘要

Distribution networks with high penetration of Distributed Energy Resources (DERs) increasingly rely on communication networks to coordinate grid-interactive control. While many distributed control schemes have been proposed, they are often evaluated under idealized communication assumptions, making it difficult to assess their performance under realistic network conditions. This work presents an implementation-driven evaluation of a representative virtual power plant (VPP) dispatch algorithm using a co-simulation framework that couples a linearized distribution-system model with packet-level downlink emulation in ns-3. The study considers a modified IEEE~37-node feeder with high photovoltaic penetration and a primal--dual VPP dispatch that simultaneously targets feeder-head active power tracking and voltage regulation. Communication effects are introduced only on the downlink path carrying dual-variable updates, where per-DER packet delays and a hold-last-value strategy are modeled. Results show that, under ideal communication, the dispatch achieves close tracking of the feeder-head power reference while maintaining voltages within the prescribed limits at selected buses. When realistic downlink delay is introduced, the same controller exhibits large oscillations in feeder-head power and more frequent voltage limit violations. These findings highlight that distributed DER control performance can be strongly influenced by communication behavior and motivate evaluation frameworks that explicitly incorporate network dynamics into the assessment of grid-interactive control schemes.

2604.19712 2026-04-22 cs.LG cond-mat.dis-nn cs.IT math.IT math.PR stat.ML

Ultrametric OGP - parametric RDT \emph{symmetric} binary perceptron connection

Mihailo Stojnic

详情
英文摘要

In [97,99,100], an fl-RDT framework is introduced to characterize \emph{statistical computational gaps} (SCGs). Studying \emph{symmetric binary perceptrons} (SBPs), [100] obtained an \emph{algorithmic} threshold estimate $α_a\approx α_c^{(7)}\approx 1.6093$ at the 7th lifting level (for $κ=1$ margin), closely approaching $1.58$ local entropy (LE) prediction [18]. In this paper, we further connect parametric RDT to overlap gap properties (OGPs), another key geometric feature of the solution space. Specifically, for any positive integer $s$, we consider $s$-level ultrametric OGPs ($ult_s$-OGPs) and rigorously upper-bound the associated constraint densities $α_{ult_s}$. To achieve this, we develop an analytical union-bounding program consisting of combinatorial and probabilistic components. By casting the combinatorial part as a convex problem and the probabilistic part as a nested integration, we conduct numerical evaluations and obtain that the tightest bounds at the first two levels, $\barα_{ult_1} \approx 1.6578$ and $\barα_{ult_2} \approx 1.6219$, closely approach the 3rd and 4th lifting level parametric RDT estimates, $α_c^{(3)} \approx 1.6576$ and $α_c^{(4)} \approx 1.6218$. We also observe excellent agreement across other key parameters, including overlap values and the relative sizes of ultrametric clusters. Based on these observations, we propose several conjectures linking $ult$-OGP and parametric RDT. Specifically, we conjecture that algorithmic threshold $α_a=\lim_{s\rightarrow\infty} α_{ult_s} = \lim_{s\rightarrow\infty} \barα{ult_s} = \lim_{r\rightarrow\infty} α_{c}^{(r)}$, and $α_{ult_s} \leq α_{c}^{(s+2)}$ (with possible equality for some (maybe even all) $s$). Finally, we discuss the potential existence of a full isomorphism connecting all key parameters of $ult$-OGP and parametric RDT.

2604.19711 2026-04-22 cs.CR cs.CY cs.HC

"We are currently clean on OPSEC": Why JD Can't Encrypt

Maurice Chiodo, Toni Erskine, Dennis Müller, James G. Wright

Comments 31 pages

详情
英文摘要

We analyse the 2025 Signalgate leak of sensitive US military information by the Trump administration, addressing why confidentiality was violated (messages leaked to the press) in spite of encryption (Signal), to deepen the socio-technical considerations when designing and deploying encryption. First, we use applied pi-calculus to formally model the boutique secure facility setup requested by the US Defence Secretary, to prove that a leak would not be prevented. We then examine how using a secure channel might still not give overall information security, as, in this case, power imbalances between personnel and officials led to the application of cryptography that compromised their operational security. We look at how cryptographic tools may have instilled a false sense of security, and led officials to "overshare". We then apply this analysis to the Trump administration's general desire to burn through political, legal, and now technical process, and demonstrate geopolitical harms that may arise from such ineffective use of cryptography in a brief use case. We conclude that, even with advancements in usability of cryptographic tools, genuine message security is still out of reach of the "average user".

2604.19710 2026-04-22 cs.CV

SpanVLA: Efficient Action Bridging and Learning from Negative-Recovery Samples for Vision-Language-Action Model

Zewei Zhou, Ruining Yang, Xuewei, Qi, Yiluan Guo, Sherry X. Chen, Tao Feng, Kateryna Pistunova, Yishan Shen, Lili Su, Jiaqi Ma

Comments Project page: https://spanvla.github.io/

详情
英文摘要

Vision-Language-Action (VLA) models offer a promising autonomous driving paradigm for leveraging world knowledge and reasoning capabilities, especially in long-tail scenarios. However, existing VLA models often struggle with the high latency in action generation using an autoregressive generation framework and exhibit limited robustness. In this paper, we propose SpanVLA, a novel end-to-end autonomous driving framework, integrating an autoregressive reasoning and a flow-matching action expert. First, SpanVLA introduces an efficient bridge to leverage the vision and reasoning guidance of VLM to efficiently plan future trajectories using a flow-matching policy conditioned on historical trajectory initialization, which significantly reduces inference time. Second, to further improve the performance and robustness of the SpanVLA model, we propose a GRPO-based post-training method to enable the VLA model not only to learn from positive driving samples but also to learn how to avoid the typical negative behaviors and learn recovery behaviors. We further introduce mReasoning, a new real-world driving reasoning dataset, focusing on complex, reasoning-demanding scenarios and negative-recovery samples. Extensive experiments on the NAVSIM (v1 and v2) demonstrate the competitive performance of the SpanVLA model. Additionally, the qualitative results across diverse scenarios highlight the planning performance and robustness of our model.

2604.19709 2026-04-22 eess.SP cs.IT math.IT

Networked Tracking of Multiple Moving Targets in 6G Network

Yanmo Hu, Weifeng Zhu, Chenshu Wu, Shuowen Zhang, J. Andrew Zhang, Liang Liu

详情
英文摘要

This paper considers a networked tracking architecture in 6G integrated sensing and communication (ISAC) systems, where multiple base stations (BSs) cooperatively transmit radio signals and process received echo signals to track multiple moving targets. Compared to the single-BS counterpart, networked tracking allows the moving targets to be associated with different BSs over time such that the wireless resources can be dynamically allocated among BSs based on target locations. However, networked tracking imposes new challenges for algorithm design and resource allocation. In this paper, we first design the networked Kalman Filter (NKF) that is suitable for multi-BS based tracking, then characterize the posterior Cramer-Rao bound (PCRB) under this NKF, and last design the beamforming vectors of all the BSs to minimize the tracking PCRB. Numerical results show that our dynamic beamforming design can properly associate the targets to the suitable BSs at various sensing blocks and reduce the tracking mean-squared error (MSE).

2604.19702 2026-04-22 cs.CV

Face Anything: 4D Face Reconstruction from Any Image Sequence

Umut Kocasari, Simon Giebenhain, Richard Shaw, Matthias Nießner

Comments Project website: https://kocasariumut.github.io/FaceAnything/ , Video: https://www.youtube.com/watch?v=wSGHpAscp0Y

详情
英文摘要

Accurate reconstruction and tracking of dynamic human faces from image sequences is challenging because non-rigid deformations, expression changes, and viewpoint variations occur simultaneously, creating significant ambiguity in geometry and correspondence estimation. We present a unified method for high-fidelity 4D facial reconstruction based on canonical facial point prediction, a representation that assigns each pixel a normalized facial coordinate in a shared canonical space. This formulation transforms dense tracking and dynamic reconstruction into a canonical reconstruction problem, enabling temporally consistent geometry and reliable correspondences within a single feed-forward model. By jointly predicting depth and canonical coordinates, our method enables accurate depth estimation, temporally stable reconstruction, dense 3D geometry, and robust facial point tracking within a single architecture. We implement this formulation using a transformer-based model that jointly predicts depth and canonical facial coordinates, trained using multi-view geometry data that non-rigidly warps into the canonical space. Extensive experiments on image and video benchmarks demonstrate state-of-the-art performance across reconstruction and tracking tasks, achieving approximately 3$\times$ lower correspondence error and faster inference than prior dynamic reconstruction methods, while improving depth accuracy by 16%. These results highlight canonical facial point prediction as an effective foundation for unified feed-forward 4D facial reconstruction.

2604.19698 2026-04-22 cs.LG math.ST stat.TH

On two ways to use determinantal point processes for Monte Carlo integration

Guillaume Gautier, Rémi Bardenet, Michal Valko

Comments NeurIPS 2019

详情
英文摘要

The standard Monte Carlo estimator $\widehat{I}_N^{\mathrm{MC}}$ of $\int fdω$ relies on independent samples from $ω$ and has variance of order $1/N$. Replacing the samples with a determinantal point process (DPP), a repulsive distribution, makes the estimator consistent, with variance rates that depend on how the DPP is adapted to $f$ and $ω$. We examine two existing DPP-based estimators: one by Bardenet & Hardy (2020) with a rate of $\mathcal{O}(N^{-(1+1/d)})$ for smooth $f$, but relying on a fixed DPP. The other, by Ermakov & Zolotukhin (1960), is unbiased with rate of order $1/N$, like Monte Carlo, but its DPP is tailored to $f$. We revisit these estimators, generalize them to continuous settings, and provide sampling algorithms.

2604.19695 2026-04-22 cs.LG

Planning in entropy-regularized Markov decision processes and games

Jean-Bastien Grill, Omar Darwiche Domingues, Pierre Ménard, Rémi Munos, Michal Valko

Comments NeurIPS 2019

详情
Journal ref
Advances in Neural Information Processing Systems 32 (NeurIPS 2019)
英文摘要

We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the environment. SmoothCruiser makes use of the smoothness of the Bellman operator promoted by the regularization to achieve problem-independent sample complexity of order O~(1/epsilon^4) for a desired accuracy epsilon, whereas for non-regularized settings there are no known algorithms with guaranteed polynomial sample complexity in the worst case.

2604.19689 2026-04-22 cs.AI

A-MAR: Agent-based Multimodal Art Retrieval for Fine-Grained Artwork Understanding

Shuai Wang, Hongyi Zhu, Jia-Hong Huang, Yixian Shen, Chengxi Zeng, Stevan Rudinac, Monika Kackovic, Nachoem Wijnberg, Marcel Worring

详情
Journal ref
ICMR 2026, ACM International Conference on Multimedia Retrieval
英文摘要

Understanding artworks requires multi-step reasoning over visual content and cultural, historical, and stylistic context. While recent multimodal large language models show promise in artwork explanation, they rely on implicit reasoning and internalized knowl- edge, limiting interpretability and explicit evidence grounding. We propose A-MAR, an Agent-based Multimodal Art Retrieval framework that explicitly conditions retrieval on structured reasoning plans. Given an artwork and a user query, A-MAR first decomposes the task into a structured reasoning plan that specifies the goals and evidence requirements for each step. Retrieval is then conditionedon this plan, enabling targeted evidence selection and supporting step-wise, grounded explanations. To evaluate agent-based multi- modal reasoning within the art domain, we introduce ArtCoT-QA. This diagnostic benchmark features multi-step reasoning chains for diverse art-related queries, enabling a granular analysis that extends beyond simple final answer accuracy. Experiments on SemArt and Artpedia show that A-MAR consistently outperforms static, non planned retrieval and strong MLLM baselines in final explanation quality, while evaluations on ArtCoT-QA further demonstrate its advantages in evidence grounding and multi-step reasoning ability. These results highlight the importance of reasoning-conditioned retrieval for knowledge-intensive multimodal understanding and position A-MAR as a step toward interpretable, goal-driven AI systems, with particular relevance to cultural industries. The code and data are available at: https://github.com/ShuaiWang97/A-MAR.

2604.19686 2026-04-22 eess.SY cs.SY

Towards Reproducible Test Annotation for Cyber-Physical Energy Systems using Ontology-driven Dataspaces

Kai Heussen, Jawad Kazmi, Narges Mehran, Artjoms Obushevs, Terence O'Donnell, Thomas I. Strasser

Comments 2026 Open Source Modelling and Simulation of Energy Systems (OSMSES)

详情
英文摘要

Reproducibility, traceability, and transparency in testing cyber-physical energy systems are crucial for scientific advancement and cross-laboratory collaboration. Current experimentation and test documentation practices lack formal semantics, making it difficult to reproduce experiments, share data, and apply, for example, the artificial intelligence-driven analysis. A dataspace that relies on structured ontologies aims to address these gaps by providing machine-actionable descriptions. In this work, we outline an ontology-driven approach for reproducibility of cyber-physical energy systems testing and illustrate its applicability through representative cross-laboratory use cases, demonstrating feasibility while identifying remaining semantic and metadata gaps that limit reproducibility. Based on these observations, we propose an open three-viewpoint ontology framework to guide future ontology extensions.

2604.19685 2026-04-22 cs.CL

An Answer is just the Start: Related Insight Generation for Open-Ended Document-Grounded QA

Saransh Sharma, Pritika Ramu, Aparna Garimella, Koyel Mukherjee

Comments Paper accepted at ACL Findings 2026

详情
英文摘要

Answering open-ended questions remains challenging for AI systems because it requires synthesis, judgment, and exploration beyond factual retrieval, and users often refine answers through multiple iterations rather than accepting a single response. Existing QA benchmarks do not explicitly support this refinement process. To address this gap, we introduce a new task, document-grounded related insight generation, where the goal is to generate additional insights from a document collection that help improve, extend, or rethink an initial answer to an open-ended question, ultimately supporting richer user interaction and a better overall question answering experience. We curate and release SCOpE-QA (Scientific Collections for Open-Ended QA), a dataset of 3,000 open-ended questions across 20 research collections. We present InsightGen, a two-stage approach that first constructs a thematic representation of the document collection using clustering, and then selects related context based on neighborhood selection from the thematic graph to generate diverse and relevant insights using LLMs. Extensive evaluation on 3,000 questions using two generation models and two evaluation settings shows that InsightGen consistently produces useful, relevant, and actionable insights, establishing a strong baseline for this new task.

2604.19684 2026-04-22 cs.LG

PREF-XAI: Preference-Based Personalized Rule Explanations of Black-Box Machine Learning Models

Salvatore Greco, Jacek Karolczak, Roman Słowiński, Jerzy Stefanowski

详情
英文摘要

Explainable artificial intelligence (XAI) has predominantly focused on generating model-centric explanations that approximate the behavior of black-box models. However, such explanations often overlook a fundamental aspect of interpretability: different users require different explanations depending on their goals, preferences, and cognitive constraints. Although recent work has explored user-centric and personalized explanations, most existing approaches rely on heuristic adaptations or implicit user modeling, lacking a principled framework for representing and learning individual preferences. In this paper, we consider Preference-Based Explainable Artificial Intelligence (PREF-XAI), a novel perspective that reframes explanation as a preference-driven decision problem. Within PREF-XAI, explanations are not treated as fixed outputs, but as alternatives to be evaluated and selected according to user-specific criteria. In the PREF-XAI perspective, here we propose a methodology that combines rule-based explanations with formal preference learning. User preferences are elicited through a ranking of a small set of candidate explanations and modeled via an additive utility function inferred using robust ordinal regression. Experimental results on real-world datasets show that PREF-XAI can accurately reconstruct user preferences from limited feedback, identify highly relevant explanations, and discover novel explanatory rules not initially considered by the user. Beyond the proposed methodology, this work establishes a connection between XAI and preference learning, opening new directions for interactive and adaptive explanation systems.

2604.19680 2026-04-22 cs.CV

IR-Flow: Bridging Discriminative and Generative Image Restoration via Rectified Flow

Zihao Fan, Xin Lu, Jie Xiao, Dong Li, Jie Huang, Xueyang Fu

详情
英文摘要

In image restoration, single-step discriminative mappings often lack fine details via expectation learning, whereas generative paradigms suffer from inefficient multi-step sampling and noise-residual coupling. To address this dilemma, we propose IR-Flow, a novel image restoration method based on Rectified Flow that serves as a unified framework bridging the gap between discriminative and generative paradigms. Specifically, we first construct multilevel data distribution flows, which expand the ability of models to learn from and adapt to various levels of degradation. Subsequently, cumulative velocity fields are proposed to learn transport trajectories across varying degradation levels, guiding intermediate states toward the clean target, while a multi-step consistency constraint is presented to enforce trajectory coherence and boost few-step restoration performance. We show that directly establishing a linear transport flow between degraded and clean image domains not only enables fast inference but also improves adaptability to out-of-distribution degradations. Extensive evaluations on deraining, denoising and raindrop removal tasks demonstrate that IR-Flow achieves competitive quantitative results with only a few sampling steps, offering an efficient and flexible framework that maintains an excellent distortion-perception balance. Our code is available at https://github.com/fanzh03/IR-Flow.

2604.19678 2026-04-22 cs.CL

Exploring Language-Agnosticity in Function Vectors: A Case Study in Machine Translation

Nurkhan Laiyk, Gerard I. Gállego, Javier Ferrando, Fajri Koto

详情
英文摘要

Function vectors (FVs) are vector representations of tasks extracted from model activations during in-context learning. While prior work has shown that multilingual model representations can be language-agnostic, it remains unclear whether the same holds for function vectors. We study whether FVs exhibit language-agnosticity, using machine translation as a case study. Across three decoder-only multilingual LLMs, we find that translation FVs extracted from a single English$\rightarrow$Target direction transfer to other target languages, consistently improving the rank of correct translation tokens across multiple unseen languages. Ablation results show that removing the FV degrades translation across languages with limited impact on unrelated tasks. We further show that base-model FVs transfer to instruction-tuned variants and partially generalize from word-level to sentence-level translation.