arXivDaily arXiv每日学术速递 周一至周五更新
2602.10115 2026-02-11 cs.CV

Quantum Multiple Rotation Averaging

Shuteng Wang, Natacha Kuete Meli, Michael Möller, Vladislav Golyanik

Comments 16 pages, 13 figures, 4 tables; project page: https://4dqv.mpi-inf.mpg.de/QMRA/

Journal ref International Conference on 3D Vision (3DV) 2026

详情
英文摘要

Multiple rotation averaging (MRA) is a fundamental optimization problem in 3D vision and robotics that aims to recover globally consistent absolute rotations from noisy relative measurements. Established classical methods, such as L1-IRLS and Shonan, face limitations including local minima susceptibility and reliance on convex relaxations that fail to preserve the exact manifold geometry, leading to reduced accuracy in high-noise scenarios. We introduce IQARS (Iterative Quantum Annealing for Rotation Synchronization), the first algorithm that reformulates MRA as a sequence of local quadratic non-convex sub-problems executable on quantum annealers after binarization, to leverage inherent hardware advantages. IQARS removes convex relaxation dependence and better preserves non-Euclidean rotation manifold geometry while leveraging quantum tunneling and parallelism for efficient solution space exploration. We evaluate IQARS's performance on synthetic and real-world datasets. While current annealers remain in their nascent phase and only support solving problems of limited scale with constrained performance, we observed that IQARS on D-Wave annealers can already achieve ca. 12% higher accuracy than Shonan, i.e., the best-performing classical method evaluated empirically.

2602.10114 2026-02-11 cs.RO

Decoupled MPPI-Based Multi-Arm Motion Planning

Dan Evron, Elias Goldsztejn, Ronen I. Brafman

详情
英文摘要

Recent advances in sampling-based motion planning algorithms for high DOF arms leverage GPUs to provide SOTA performance. These algorithms can be used to control multiple arms jointly, but this approach scales poorly. To address this, we extend STORM, a sampling-based model-predictive-control (MPC) motion planning algorithm, to handle multiple robots in a distributed fashion. First, we modify STORM to handle dynamic obstacles. Then, we let each arm compute its own motion plan prefix, which it shares with the other arms, which treat it as a dynamic obstacle. Finally, we add a dynamic priority scheme. The new algorithm, MR-STORM, demonstrates clear empirical advantages over SOTA algorithms when operating with both static and dynamic obstacles.

2602.10113 2026-02-11 cs.CV

ConsID-Gen: View-Consistent and Identity-Preserving Image-to-Video Generation

Mingyang Wu, Ashirbad Mishra, Soumik Dey, Shuo Xing, Naveen Ravipati, Hansi Wu, Binbin Li, Zhengzhong Tu

Comments Project page: https://myangwu.github.io/ConsID-Gen

详情
英文摘要

Image-to-Video generation (I2V) animates a static image into a temporally coherent video sequence following textual instructions, yet preserving fine-grained object identity under changing viewpoints remains a persistent challenge. Unlike text-to-video models, existing I2V pipelines often suffer from appearance drift and geometric distortion, artifacts we attribute to the sparsity of single-view 2D observations and weak cross-modal alignment. Here we address this problem from both data and model perspectives. First, we curate ConsIDVid, a large-scale object-centric dataset built with a scalable pipeline for high-quality, temporally aligned videos, and establish ConsIDVid-Bench, where we present a novel benchmarking and evaluation framework for multi-view consistency using metrics sensitive to subtle geometric and appearance deviations. We further propose ConsID-Gen, a view-assisted I2V generation framework that augments the first frame with unposed auxiliary views and fuses semantic and structural cues via a dual-stream visual-geometric encoder as well as a text-visual connector, yielding unified conditioning for a Diffusion Transformer backbone. Experiments across ConsIDVid-Bench demonstrate that ConsID-Gen consistently outperforms in multiple metrics, with the best overall performance surpassing leading video generation models like Wan2.1 and HunyuanVideo, delivering superior identity fidelity and temporal coherence under challenging real-world scenarios. We will release our model and dataset at https://myangwu.github.io/ConsID-Gen.

2602.10111 2026-02-11 cs.RO

Learning Agile Quadrotor Flight in the Real World

Yunfan Ren, Zhiyuan Zhu, Jiaxu Xing, Davide Scaramuzza

详情
英文摘要

Learning-based controllers have achieved impressive performance in agile quadrotor flight but typically rely on massive training in simulation, necessitating accurate system identification for effective Sim2Real transfer. However, even with precise modeling, fixed policies remain susceptible to out-of-distribution scenarios, ranging from external aerodynamic disturbances to internal hardware degradation. To ensure safety under these evolving uncertainties, such controllers are forced to operate with conservative safety margins, inherently constraining their agility outside of controlled settings. While online adaptation offers a potential remedy, safely exploring physical limits remains a critical bottleneck due to data scarcity and safety risks. To bridge this gap, we propose a self-adaptive framework that eliminates the need for precise system identification or offline Sim2Real transfer. We introduce Adaptive Temporal Scaling (ATS) to actively explore platform physical limits, and employ online residual learning to augment a simple nominal model. {Based on the learned hybrid model, we further propose Real-world Anchored Short-horizon Backpropagation Through Time (RASH-BPTT) to achieve efficient and robust in-flight policy updates. Extensive experiments demonstrate that our quadrotor reliably executes agile maneuvers near actuator saturation limits. The system evolves a conservative base policy with a peak speed of 1.9 m/s to 7.3 m/s within approximately 100 seconds of flight time. These findings underscore that real-world adaptation serves not merely to compensate for modeling errors, but as a practical mechanism for sustained performance improvement in aggressive flight regimes.

2602.10109 2026-02-11 cs.RO

ST4VLA: Spatially Guided Training for Vision-Language-Action Models

Jinhui Ye, Fangjing Wang, Ning Gao, Junqiu Yu, Yangkun Zhu, Bin Wang, Jinyu Zhang, Weiyang Jin, Yanwei Fu, Feng Zheng, Yilun Chen, Jiangmiao Pang

Comments Spatially Training for VLA, Accepted by ICLR 2026

详情
英文摘要

Large vision-language models (VLMs) excel at multimodal understanding but fall short when extended to embodied tasks, where instructions must be transformed into low-level motor actions. We introduce ST4VLA, a dual-system Vision-Language-Action framework that leverages Spatial Guided Training to align action learning with spatial priors in VLMs. ST4VLA includes two stages: (i) spatial grounding pre-training, which equips the VLM with transferable priors via scalable point, box, and trajectory prediction from both web-scale and robot-specific data, and (ii) spatially guided action post-training, which encourages the model to produce richer spatial priors to guide action generation via spatial prompting. This design preserves spatial grounding during policy learning and promotes consistent optimization across spatial and action objectives. Empirically, ST4VLA achieves substantial improvements over vanilla VLA, with performance increasing from 66.1 -> 84.6 on Google Robot and from 54.7 -> 73.2 on WidowX Robot, establishing new state-of-the-art results on SimplerEnv. It also demonstrates stronger generalization to unseen objects and paraphrased instructions, as well as robustness to long-horizon perturbations in real-world settings. These results highlight scalable spatially guided training as a promising direction for robust, generalizable robot learning. Source code, data and models are released at https://internrobotics.github.io/internvla-m1.github.io/

2602.10105 2026-02-11 cs.RO

DexImit: Learning Bimanual Dexterous Manipulation from Monocular Human Videos

Juncheng Mu, Sizhe Yang, Yiming Bao, Hojin Bae, Tianming Wei, Linning Xu, Boyi Li, Huazhe Xu, Jiangmiao Pang

详情
英文摘要

Data scarcity fundamentally limits the generalization of bimanual dexterous manipulation, as real-world data collection for dexterous hands is expensive and labor-intensive. Human manipulation videos, as a direct carrier of manipulation knowledge, offer significant potential for scaling up robot learning. However, the substantial embodiment gap between human hands and robotic dexterous hands makes direct pretraining from human videos extremely challenging. To bridge this gap and unleash the potential of large-scale human manipulation video data, we propose DexImit, an automated framework that converts monocular human manipulation videos into physically plausible robot data, without any additional information. DexImit employs a four-stage generation pipeline: (1) reconstructing hand-object interactions from arbitrary viewpoints with near-metric scale; (2) performing subtask decomposition and bimanual scheduling; (3) synthesizing robot trajectories consistent with the demonstrated interactions; (4) comprehensive data augmentation for zero-shot real-world deployment. Building on these designs, DexImit can generate large-scale robot data based on human videos, either from the Internet or video generation models. DexImit is capable of handling diverse manipulation tasks, including tool use (e.g., cutting an apple), long-horizon tasks (e.g., making a beverage), and fine-grained manipulations (e.g., stacking cups).

2602.10102 2026-02-11 cs.CV

VideoWorld 2: Learning Transferable Knowledge from Real-world Videos

Zhongwei Ren, Yunchao Wei, Xiao Yu, Guixun Luo, Yao Zhao, Bingyi Kang, Jiashi Feng, Xiaojie Jin

Comments Code and models are released at: https://maverickren.github.io/VideoWorld2.github.io/

详情
英文摘要

Learning transferable knowledge from unlabeled video data and applying it in new environments is a fundamental capability of intelligent agents. This work presents VideoWorld 2, which extends VideoWorld and offers the first investigation into learning transferable knowledge directly from raw real-world videos. At its core, VideoWorld 2 introduces a dynamic-enhanced Latent Dynamics Model (dLDM) that decouples action dynamics from visual appearance: a pretrained video diffusion model handles visual appearance modeling, enabling the dLDM to learn latent codes that focus on compact and meaningful task-related dynamics. These latent codes are then modeled autoregressively to learn task policies and support long-horizon reasoning. We evaluate VideoWorld 2 on challenging real-world handcraft making tasks, where prior video generation and latent-dynamics models struggle to operate reliably. Remarkably, VideoWorld 2 achieves up to 70% improvement in task success rate and produces coherent long execution videos. In robotics, we show that VideoWorld 2 can acquire effective manipulation knowledge from the Open-X dataset, which substantially improves task performance on CALVIN. This study reveals the potential of learning transferable world knowledge directly from raw videos, with all code, data, and models to be open-sourced for further research.

2602.10099 2026-02-11 cs.LG cs.CV

Learning on the Manifold: Unlocking Standard Diffusion Transformers with Representation Encoders

Amandeep Kumar, Vishal M. Patel

Comments Technical Report

详情
英文摘要

Leveraging representation encoders for generative modeling offers a path for efficient, high-fidelity synthesis. However, standard diffusion transformers fail to converge on these representations directly. While recent work attributes this to a capacity bottleneck proposing computationally expensive width scaling of diffusion transformers we demonstrate that the failure is fundamentally geometric. We identify Geometric Interference as the root cause: standard Euclidean flow matching forces probability paths through the low-density interior of the hyperspherical feature space of representation encoders, rather than following the manifold surface. To resolve this, we propose Riemannian Flow Matching with Jacobi Regularization (RJF). By constraining the generative process to the manifold geodesics and correcting for curvature-induced error propagation, RJF enables standard Diffusion Transformer architectures to converge without width scaling. Our method RJF enables the standard DiT-B architecture (131M parameters) to converge effectively, achieving an FID of 3.37 where prior methods fail to converge. Code: https://github.com/amandpkr/RJF

2602.10097 2026-02-11 cs.LG cs.AI

Step-resolved data attribution for looped transformers

Georgios Kaissis, David Mildenberger, Juan Felipe Gomez, Martin J. Menten, Eleni Triantafillou

详情
英文摘要

We study how individual training examples shape the internal computation of looped transformers, where a shared block is applied for $τ$ recurrent iterations to enable latent reasoning. Existing training-data influence estimators such as TracIn yield a single scalar score that aggregates over all loop iterations, obscuring when during the recurrent computation a training example matters. We introduce \textit{Step-Decomposed Influence (SDI)}, which decomposes TracIn into a length-$τ$ influence trajectory by unrolling the recurrent computation graph and attributing influence to specific loop iterations. To make SDI practical at transformer scale, we propose a TensorSketch implementation that never materialises per-example gradients. Experiments on looped GPT-style models and algorithmic reasoning tasks show that SDI scales excellently, matches full-gradient baselines with low error and supports a broad range of data attribution and interpretability tasks with per-step insights into the latent reasoning process.

2602.10096 2026-02-11 cs.GT

The Complexity of Proper Equilibrium in Extensive-Form and Polytope Games

Brian Hu Zhang, Ioannis Anagnostides, Kiriaki Fragkia, Maria-Florina Balcan, Tuomas Sandholm

Comments This paper contains and extends results that were originally in a prior version of arXiv:2511.03968

详情
英文摘要

The proper equilibrium, introduced by Myerson (1978), is a classic refinement of the Nash equilibrium that has been referred to as the "mother of all refinements." For normal-form games, computing a proper equilibrium is known to be PPAD-complete for two-player games and FIXP$_a$-complete for games with at least three players. However, the complexity beyond normal-form games -- in particular, for extensive-form games (EFGs) -- was a long-standing open problem first highlighted by Miltersen and Sørensen (SODA '08). In this paper, we resolve this problem by establishing PPAD- and FIXP$_a$-membership (and hence completeness) of normal-form proper equilibria in two-player and multi-player EFGs respectively. Our main ingredient is a technique for computing a perturbed (proper) best response that can be computed efficiently in EFGs. This is despite the fact that, as we show, computing a best response using the classic perturbation of Kohlberg and Mertens based on the permutahedron is #P-hard even in Bayesian games. In stark contrast, we show that computing a proper equilibrium in polytope games is NP-hard. This marks the first natural class in which the complexity of computing equilibrium refinements does not collapse to that of Nash equilibria, and the first problem in which equilibrium computation in polytope games is strictly harder -- unless there is a collapse in the complexity hierarchy -- relative to extensive-form games.

2602.10095 2026-02-11 cs.CV cs.AI cs.LG

Causality in Video Diffusers is Separable from Denoising

Xingjian Bai, Guande He, Zhengqi Li, Eli Shechtman, Xun Huang, Zongze Wu

详情
英文摘要

Causality -- referring to temporal, uni-directional cause-effect relationships between components -- underlies many complex generative processes, including videos, language, and robot trajectories. Current causal diffusion models entangle temporal reasoning with iterative denoising, applying causal attention across all layers, at every denoising step, and over the entire context. In this paper, we show that the causal reasoning in these models is separable from the multi-step denoising process. Through systematic probing of autoregressive video diffusers, we uncover two key regularities: (1) early layers produce highly similar features across denoising steps, indicating redundant computation along the diffusion trajectory; and (2) deeper layers exhibit sparse cross-frame attention and primarily perform intra-frame rendering. Motivated by these findings, we introduce Separable Causal Diffusion (SCD), a new architecture that explicitly decouples once-per-frame temporal reasoning, via a causal transformer encoder, from multi-step frame-wise rendering, via a lightweight diffusion decoder. Extensive experiments on both pretraining and post-training tasks across synthetic and real benchmarks show that SCD significantly improves throughput and per-frame latency while matching or surpassing the generation quality of strong causal diffusion baselines.

2602.10094 2026-02-11 cs.CV

4RC: 4D Reconstruction via Conditional Querying Anytime and Anywhere

Yihang Luo, Shangchen Zhou, Yushi Lan, Xingang Pan, Chen Change Loy

Comments Project page: https://yihangluo.com/projects/4RC/

详情
英文摘要

We present 4RC, a unified feed-forward framework for 4D reconstruction from monocular videos. Unlike existing approaches that typically decouple motion from geometry or produce limited 4D attributes such as sparse trajectories or two-view scene flow, 4RC learns a holistic 4D representation that jointly captures dense scene geometry and motion dynamics. At its core, 4RC introduces a novel encode-once, query-anywhere and anytime paradigm: a transformer backbone encodes the entire video into a compact spatio-temporal latent space, from which a conditional decoder can efficiently query 3D geometry and motion for any query frame at any target timestamp. To facilitate learning, we represent per-view 4D attributes in a minimally factorized form by decomposing them into base geometry and time-dependent relative motion. Extensive experiments demonstrate that 4RC outperforms prior and concurrent methods across a wide range of 4D reconstruction tasks.

2602.10093 2026-02-11 cs.RO

UniVTAC: A Unified Simulation Platform for Visuo-Tactile Manipulation Data Generation, Learning, and Benchmarking

Baijun Chen, Weijie Wan, Tianxing Chen, Xianda Guo, Congsheng Xu, Yuanyang Qi, Haojie Zhang, Longyan Wu, Tianling Xu, Zixuan Li, Yizhe Wu, Rui Li, Xiaokang Yang, Ping Luo, Wei Sui, Yao Mu

Comments Website: https://univtac.github.io/

详情
英文摘要

Robotic manipulation has seen rapid progress with vision-language-action (VLA) policies. However, visuo-tactile perception is critical for contact-rich manipulation, as tasks such as insertion are difficult to complete robustly using vision alone. At the same time, acquiring large-scale and reliable tactile data in the physical world remains costly and challenging, and the lack of a unified evaluation platform further limits policy learning and systematic analysis. To address these challenges, we propose UniVTAC, a simulation-based visuo-tactile data synthesis platform that supports three commonly used visuo-tactile sensors and enables scalable and controllable generation of informative contact interactions. Based on this platform, we introduce the UniVTAC Encoder, a visuo-tactile encoder trained on large-scale simulation-synthesized data with designed supervisory signals, providing tactile-centric visuo-tactile representations for downstream manipulation tasks. In addition, we present the UniVTAC Benchmark, which consists of eight representative visuo-tactile manipulation tasks for evaluating tactile-driven policies. Experimental results show that integrating the UniVTAC Encoder improves average success rates by 17.1% on the UniVTAC Benchmark, while real-world robotic experiments further demonstrate a 25% improvement in task success. Our webpage is available at https://univtac.github.io/.

2602.10092 2026-02-11 cs.CL

Quantum-Audit: Evaluating the Reasoning Limits of LLMs on Quantum Computing

Mohamed Afane, Kayla Laufer, Wenqi Wei, Ying Mao, Junaid Farooq, Ying Wang, Juntao Chen

Comments 18 pages

详情
英文摘要

Language models have become practical tools for quantum computing education and research, from summarizing technical papers to explaining theoretical concepts and answering questions about recent developments in the field. While existing benchmarks evaluate quantum code generation and circuit design, their understanding of quantum computing concepts has not been systematically measured. Quantum-Audit addresses this gap with 2,700 questions covering core quantum computing topics. We evaluate 26 models from leading organizations. Our benchmark comprises 1,000 expert-written questions, 1,000 questions extracted from research papers using LLMs and validated by experts, plus an additional 700 questions including 350 open-ended questions and 350 questions with false premises to test whether models can correct erroneous assumptions. Human participants scored between 23% and 86%, with experts averaging 74%. Top-performing models exceeded the expert average, with Claude Opus 4.5 reaching 84% accuracy, though top models showed an average 12-point accuracy drop on expert-written questions compared to LLM-generated ones. Performance declined further on advanced topics, dropping to 73% on security questions. Additionally, models frequently accepted and reinforced false premises embedded in questions instead of identifying them, with accuracy below 66% on these critical reasoning tasks.

2602.10083 2026-02-11 cs.GT

Allocation Proportionality of OWA--Based Committee Scoring Rules

Daria Boratyn, Dariusz Stolicki

Comments 17 pages, 4 figures

详情
英文摘要

While proportionality is frequently named as a desirable property of voting rules, its interpretation in multiwinner voting differs significantly from that in apportionment. We aim to bridge these two distinct notions of proportionality by introducing the concept of allocation proportionality, founded upon the framework of party elections, where each candidate in a multiwinner election is assigned to a party. A voting rule is allocation proportional if each party's share of elected candidates equals that party's aggregate score. Recognizing that no committee scoring rule can universally satisfy allocation proportionality in practice, we introduce a new measure of allocation proportionality degree and discuss how it relates to other quantitative measures of proportionality. This measure allows us to compare OWA-based committee scoring rules according to how much they diverge from the ideal of allocation proportionality. We present experimental results for several common rules: SNTV, $k$-Borda, Chamberlin-Courant, Harmonic Borda, Proportional $k$-Approval Voting, and Bloc Voting.

2602.10080 2026-02-11 cs.DS

Beyond a Single Queue: Multi-Level-Multi-Queue as an Effective Design for SSSP problems on GPUs

Zhengding Hu, Jingwen Sun, Le Jiang, Yuhao Wang, Junqing Lin, Yi Zong, Guangzhong Sun

详情
英文摘要

As one of the most fundamental problems in graph processing, the Single-Source Shortest Path (SSSP) problem plays a critical role in numerous application scenarios. However, existing GPU-based solutions remain inefficient, as they typically rely on a single, fixed queue design that incurs severe synchronization overhead, high memory latency, and poor adaptivity to diverse inputs. To address these inefficiencies, we propose MultiLevelMultiQueue (MLMQ), a novel data structure that distributes multiple queues across the GPU's multi-level parallelism and memory hierarchy. To realize MLMQ, we introduce a cache-like collaboration mechanism for efficient inter-queue coordination, and develop a modular queue design based on unified Read and Write primitives. Within this framework, we expand the optimization space by designing a set of GPU-friendly queues, composing them across multiple levels, and further providing an input-adaptive MLMQ configuration scheme. Our MLMQ design achieves average speedups of 1.87x to 17.13x over state-of-the-art implementations. Our code is open-sourced at https://github.com/Leo9660/MLMQ.git.

2602.10079 2026-02-11 cs.CV

Can Image Splicing and Copy-Move Forgery Be Detected by the Same Model? Forensim: An Attention-Based State-Space Approach

Soumyaroop Nandi, Prem Natarajan

详情
英文摘要

We introduce Forensim, an attention-based state-space framework for image forgery detection that jointly localizes both manipulated (target) and source regions. Unlike traditional approaches that rely solely on artifact cues to detect spliced or forged areas, Forensim is designed to capture duplication patterns crucial for understanding context. In scenarios such as protest imagery, detecting only the forged region, for example a duplicated act of violence inserted into a peaceful crowd, can mislead interpretation, highlighting the need for joint source-target localization. Forensim outputs three-class masks (pristine, source, target) and supports detection of both splicing and copy-move forgeries within a unified architecture. We propose a visual state-space model that leverages normalized attention maps to identify internal similarities, paired with a region-based block attention module to distinguish manipulated regions. This design enables end-to-end training and precise localization. Forensim achieves state-of-the-art performance on standard benchmarks. We also release CMFD-Anything, a new dataset addressing limitations of existing copy-move forgery datasets.

2602.10074 2026-02-11 cs.CR cs.CL

CAPID: Context-Aware PII Detection for Question-Answering Systems

Mariia Ponomarenko, Sepideh Abedini, Masoumeh Shafieinejad, D. B. Emerson, Shubhankar Mohapatra, Xi He

Comments Accepted to the Student Research Workshop at EACL 2026

详情
英文摘要

Detecting personally identifiable information (PII) in user queries is critical for ensuring privacy in question-answering systems. Current approaches mainly redact all PII, disregarding the fact that some of them may be contextually relevant to the user's question, resulting in a degradation of response quality. Large language models (LLMs) might be able to help determine which PII are relevant, but due to their closed source nature and lack of privacy guarantees, they are unsuitable for sensitive data processing. To achieve privacy-preserving PII detection, we propose CAPID, a practical approach that fine-tunes a locally owned small language model (SLM) that filters sensitive information before it is passed to LLMs for QA. However, existing datasets do not capture the context-dependent relevance of PII needed to train such a model effectively. To fill this gap, we propose a synthetic data generation pipeline that leverages LLMs to produce a diverse, domain-rich dataset spanning multiple PII types and relevance levels. Using this dataset, we fine-tune an SLM to detect PII spans, classify their types, and estimate contextual relevance. Our experiments show that relevance-aware PII detection with a fine-tuned SLM substantially outperforms existing baselines in span, relevance and type accuracy while preserving significantly higher downstream utility under anonymization.

2602.10073 2026-02-11 cs.CC

Some conditions implying if P=NP then P=PSPACE

Ismael Rodriguez

详情
英文摘要

We identify a few conditions $X$ such that $(P=NP \wedge X) \;\Rightarrow\; P=PSPACE$.

2602.10054 2026-02-11 cs.HC

AIDED: Augmenting Interior Design with Human Experience Data for Designer-AI Co-Design

Yang Chen Lin, Chen-Ying Chen, Kai-Hsin Hou, Hung-Yu Chen, Po-Chih Kuo

Comments 29 pages, 14 figures, Accepted to CHI 2026

详情
英文摘要

Interior design often struggles to capture the subtleties of client experience, leaving gaps between what clients feel and what designers can act upon. We present AIDED, a designer-AI co-design workflow that integrates multimodal client data into generative AI (GAI) design processes. In a within-subjects study with twelve professional designers, we compared four modalities: baseline briefs, gaze heatmaps, questionnaire visualizations, and AI-predicted overlays. Results show that questionnaire data were trusted, creativity-enhancing, and satisfying; gaze heatmaps increased cognitive load; and AI-predicted overlays improved GAI communication but required natural language mediation to establish trust. Interviews confirmed that an authenticity-interpretability trade-off is central to balancing client voices with professional control. Our contributions are: (1) a system that incorporates experiential client signals into GAI design workflows; (2) empirical evidence of how different modalities affect design outcomes; and (3) implications for future AI tools that support human-data interaction in creative practice.

2602.10050 2026-02-11 cs.DS

Maximizing Diversity in (near-)Median String Selection

Diptarka Chakraborty, Rudrayan Kundu, Nidhi Purohit, Aravinda Kanchana Ruwanpathirana

Comments Accepted for publication at CPM 2026

详情
英文摘要

Given a set of strings over a specified alphabet, identifying a median or consensus string that minimizes the total distance to all input strings is a fundamental data aggregation problem. When the Hamming distance is considered as the underlying metric, this problem has extensive applications, ranging from bioinformatics to pattern recognition. However, modern applications often require the generation of multiple (near-)optimal yet diverse median strings to enhance flexibility and robustness in decision-making. In this study, we address this need by focusing on two prominent diversity measures: sum dispersion and min dispersion. We first introduce an exact algorithm for the diameter variant of the problem, which identifies pairs of near-optimal medians that are maximally diverse. Subsequently, we propose a $(1-ε)$-approximation algorithm (for any $ε>0$) for sum dispersion, as well as a bi-criteria approximation algorithm for the more challenging min dispersion case, allowing the generation of multiple (more than two) diverse near-optimal Hamming medians. Our approach primarily leverages structural insights into the Hamming median space and also draws on techniques from error-correcting code construction to establish these results.

2602.10046 2026-02-11 cs.SE

Artisan: Agentic Artifact Evaluation

Doehyun Baek, Michael Pradel

详情
英文摘要

Artifact evaluation has become standard practice in the software engineering community to ensure the reproducibility of research results. However, the current manual process is labor-intensive, and hence, done only as a one-time assessment for a subset of all papers. To support the artifact evaluation effort, we present Artisan, an automated LLM agent for reproducing research results given a paper and its artifact. The approach is enabled by two key contributions: First, we frame the reproduction problem as a code generation task where the goal is to generate a reproduction script that, when executed, reproduces the results reported in a paper. Unlike prior work on automatically reproducing research results in other domains, this formulation allows for running the script independently of the agent and for assessing the reproduction process at a fine-grained level. Second, we design automated judging mechanism that guides the agent toward the expected results without revealing them and that prevent trivial solutions, such as simply copying checked-in results. To evaluate Artisan, we introduce Artisan-Bench, the first benchmark assessing the ability to generate reproduction scripts and the first benchmark for automated artifact evaluation in software engineering. Artisan-Bench comprises 60 tasks derived from 23 software engineering papers, covering different research areas and programming languages. We validate all tasks in Artisan-Bench for reproducibility to ensure that the tasks are feasible. Our experiments show that Artisan is effective, producing 44/60 reproduction scripts and outperforming the best available baseline, a vanilla LLM agent (mini-swe-agent), by 3.14$\times$ in terms of reproduction scripts generated while taking $0.45 and 48 minutes, on average per task. Artisan also helped uncover 20 new errors in either the paper or artifact.

2602.10045 2026-02-11 cs.CV cs.LG stat.ME stat.ML

Conformal Prediction Sets for Instance Segmentation

Kerri Lu, Dan M. Kluger, Stephen Bates, Sherrie Wang

详情
英文摘要

Current instance segmentation models achieve high performance on average predictions, but lack principled uncertainty quantification: their outputs are not calibrated, and there is no guarantee that a predicted mask is close to the ground truth. To address this limitation, we introduce a conformal prediction algorithm to generate adaptive confidence sets for instance segmentation. Given an image and a pixel coordinate query, our algorithm generates a confidence set of instance predictions for that pixel, with a provable guarantee for the probability that at least one of the predictions has high Intersection-Over-Union (IoU) with the true object instance mask. We apply our algorithm to instance segmentation examples in agricultural field delineation, cell segmentation, and vehicle detection. Empirically, we find that our prediction sets vary in size based on query difficulty and attain the target coverage, outperforming existing baselines such as Learn Then Test, Conformal Risk Control, and morphological dilation-based methods. We provide versions of the algorithm with asymptotic and finite sample guarantees.

2602.10044 2026-02-11 cs.LG cs.AI cs.SY eess.SY

Optimistic World Models: Efficient Exploration in Model-Based Deep Reinforcement Learning

Akshay Mete, Shahid Aamir Sheikh, Tzu-Hsiang Lin, Dileep Kalathil, P. R. Kumar

详情
英文摘要

Efficient exploration remains a central challenge in reinforcement learning (RL), particularly in sparse-reward environments. We introduce Optimistic World Models (OWMs), a principled and scalable framework for optimistic exploration that brings classical reward-biased maximum likelihood estimation (RBMLE) from adaptive control into deep RL. In contrast to upper confidence bound (UCB)-style exploration methods, OWMs incorporate optimism directly into model learning by augmentation with an optimistic dynamics loss that biases imagined transitions toward higher-reward outcomes. This fully gradient-based loss requires neither uncertainty estimates nor constrained optimization. Our approach is plug-and-play with existing world model frameworks, preserving scalability while requiring only minimal modifications to standard training procedures. We instantiate OWMs within two state-of-the-art world model architectures, leading to Optimistic DreamerV3 and Optimistic STORM, which demonstrate significant improvements in sample efficiency and cumulative return compared to their baseline counterparts.

2602.10043 2026-02-11 cs.CV

Simple Image Processing and Similarity Measures Can Link Data Samples across Databases through Brain MRI

Gaurang Sharma, Harri Polonen, Juha Pajula, Jutta Suksi, Jussi Tohka

详情
英文摘要

Head Magnetic Resonance Imaging (MRI) is routinely collected and shared for research under strict regulatory frameworks. These frameworks require removing potential identifiers before sharing. But, even after skull stripping, the brain parenchyma contains unique signatures that can match other MRIs from the same participants across databases, posing a privacy risk if additional data features are available. Current regulatory frameworks often mandate evaluating such risks based on the assessment of a certain level of reasonableness. Prior studies have already suggested that a brain MRI could enable participant linkage, but they have relied on training-based or computationally intensive methods. Here, we demonstrate that linking an individual's skull-stripped T1-weighted MRI, which may lead to re-identification if other identifiers are available, is possible using standard preprocessing followed by image similarity computation. Nearly perfect linkage accuracy was achieved in matching data samples across various time intervals, scanner types, spatial resolutions, and acquisition protocols, despite potential cognitive decline, simulating MRI matching across databases. These results aim to contribute meaningfully to the development of thoughtful, forward-looking policies in medical data sharing.

2602.10039 2026-02-11 cs.CY

Budgeting Discretion: Theory and Evidence on Street-Level Decision-Making

Gaurab Pokharel, Sanmay Das, Patrick J. Fowler

详情
英文摘要

Street-level bureaucrats, such as caseworkers and border guards routinely face the dilemma of whether to follow rigid policy or exercise discretion based on professional judgement. However, frequent overrides threaten consistency and introduce bias, explaining why bureaucracies often ration discretion as a finite resource. While prior work models discretion as a static cost-benefit tradeoff, we lack a principled model of how discretion should be rationed over time under real operational constraints. We formalize discretion as a dynamic allocation problem in which an agent receives stochastic opportunities to improve upon a default policy and must spend a limited override budget K over a finite horizon T. We show that overrides follow a dynamic threshold rule: use discretion only when the opportunity exceeds a time and budget-dependent cutoff. Our main theoretical contribution identifies a behavioral invariance: for location-scale families of improvement distributions, the rate at which an optimal agent exercises discretion is independent of the scale of potential gains and depends only on the distribution's shape (e.g., tail heaviness). This result implies systematic differences in discretionary "policy personality." When gains are fat-tailed, optimal agents are patient, conserving discretion for outliers. When gains are thin-tailed, agents spend more routinely. We illustrate these implications using data from a homelessness services system. Discretionary overrides track operational constraints: they are higher at the start of the workweek, suppressed on weekends when intake is offline, and shift with short-run housing capacity. These results suggest that discretion can be both procedurally constrained and welfare-improving when treated as an explicitly budgeted resource, providing a foundation for auditing override patterns and designing decision-support systems.

2602.10037 2026-02-11 cs.LG cond-mat.stat-mech quant-ph

Effectiveness of Binary Autoencoders for QUBO-Based Optimization Problems

Tetsuro Abe, Masashi Yamashita, Shu Tanaka

Comments 14 pages, 5 figures

详情
英文摘要

In black-box combinatorial optimization, objective evaluations are often expensive, so high quality solutions must be found under a limited budget. Factorization machine with quantum annealing (FMQA) builds a quadratic surrogate model from evaluated samples and optimizes it on an Ising machine. However, FMQA requires binary decision variables, and for nonbinary structures such as integer permutations, the choice of binary encoding strongly affects search efficiency. If the encoding fails to reflect the original neighborhood structure, small Hamming moves may not correspond to meaningful modifications in the original solution space, and constrained problems can yield many infeasible candidates that waste evaluations. Recent work combines FMQA with a binary autoencoder (bAE) that learns a compact binary latent code from feasible solutions, yet the mechanism behind its performance gains is unclear. Using a small traveling salesman problem as an interpretable testbed, we show that the bAE reconstructs feasible tours accurately and, compared with manually designed encodings at similar compression, better aligns tour distances with latent Hamming distances, yields smoother neighborhoods under small bit flips, and produces fewer local optima. These geometric properties explain why bAE+FMQA improves the approximation ratio faster while maintaining feasibility throughout optimization, and they provide guidance for designing latent representations for black-box optimization.

2602.10036 2026-02-11 cs.FL

Automata on Graph Alphabets

Hugo Bazille, Uli Fahrenberg

详情
英文摘要

The theory of finite automata concerns itself with words in a free monoid together with concatenation and without further structure. There are, however, important applications which use alphabets which are structured in some sense. We introduce automata over a particular type of structured data, namely an alphabet which is given as a (finite or infinite) directed graph. This constrains concatenation: two strings may only be concatenated if the end vertex of the first is equal to the start vertex of the second. We develop the beginnings of an automata theory for languages on graph alphabets. We show that they admit a Kleene theorem, relating rational and regular languages, and a Myhill-Nerode theorem, stating that languages are regular iff they have finite prefix or, equivalently, suffix quotient. We present determinization and minimization algorithms, but we also exhibit that regular languages are not stable by complementation. Finally, we mention how these structures could be generalized to presimplicial alphabets, where languages are no more freely generated.

2602.10035 2026-02-11 cs.RO

A Collision-Free Sway Damping Model Predictive Controller for Safe and Reactive Forestry Crane Navigation

Marc-Philip Ecker, Christoph Fröhlich, Johannes Huemer, David Gruber, Bernhard Bischof, Tobias Glück, Wolfgang Kemmetmüller

Comments Accepted at ICRA 2026

详情
英文摘要

Forestry cranes operate in dynamic, unstructured outdoor environments where simultaneous collision avoidance and payload sway control are critical for safe navigation. Existing approaches address these challenges separately, either focusing on sway damping with predefined collision-free paths or performing collision avoidance only at the global planning level. We present the first collision-free, sway-damping model predictive controller (MPC) for a forestry crane that unifies both objectives in a single control framework. Our approach integrates LiDAR-based environment mapping directly into the MPC using online Euclidean distance fields (EDF), enabling real-time environmental adaptation. The controller simultaneously enforces collision constraints while damping payload sway, allowing it to (i) replan upon quasi-static environmental changes, (ii) maintain collision-free operation under disturbances, and (iii) provide safe stopping when no bypass exists. Experimental validation on a real forestry crane demonstrates effective sway damping and successful obstacle avoidance. A video can be found at https://youtu.be/tEXDoeLLTxA.

2602.10030 2026-02-11 cs.CC

Optimal PRGs for Low-Degree Polynomials over Polynomial-Size Fields

Gil Cohen, Dean Doron, Noam Goldgraber

Comments arXiv admin note: text overlap with arXiv:2402.11915 by other authors

详情
英文摘要

Pseudorandom generators (PRGs) for low-degree polynomials are a central object in pseudorandomness, with applications to circuit lower bounds and derandomization. Viola's celebrated construction gives a PRG over the binary field, but with seed length exponential in the degree $d$. This exponential dependence can be avoided over sufficiently large fields. In particular, Dwivedi, Guo, and Volk constructed PRGs with optimal seed length over fields of size exponential in $d$. The latter builds on the framework of Derksen and Viola, who obtained optimal-seed constructions over fields of size polynomial in $d$, although growing with the number of variables $n$. In this work, we construct the first PRG with optimal seed length for degree-$d$ polynomials over fields of polynomial size, specifically $q \approx d^4$, assuming sufficiently large characteristic. Our construction follows the framework of prior work and reduces the required field size by replacing the hitting-set generator used in previous constructions with a new pseudorandom object. We also observe a threshold phenomenon in the field-size dependence. Specifically, we prove that constructing PRGs over fields of sublinear size, for example $q = d^{0.99}$ where $q$ is a power of two, would already yield PRGs for the binary field with comparable seed length via our reduction, provided that the construction imposes no restriction on the characteristic. While a breakdown of existing techniques has been noted before, we prove that this phenomenon is inherent to the problem itself, irrespective of the technique used.