arXivDaily arXiv每日学术速递 周一至周五更新
重置
2603.26666 2026-03-30 cs.RO

VLA-OPD: Bridging Offline SFT and Online RL for Vision-Language-Action Models via On-Policy Distillation

Zhide Zhong, Haodong Yan, Junfeng Li, Junjie He, Tianran Zhang, Haoang Li

详情
英文摘要

Although pre-trained Vision-Language-Action (VLA) models exhibit impressive generalization in robotic manipulation, post-training remains crucial to ensure reliable performance during deployment. However, standard offline Supervised Fine-Tuning (SFT) suffers from distribution shifts and catastrophic forgetting of pre-trained capabilities, while online Reinforcement Learning (RL) struggles with sparse rewards and poor sample efficiency. In this paper, we propose On-Policy VLA Distillation (VLA-OPD), a framework bridging the efficiency of SFT with the robustness of RL. Instead of relying on sparse environmental rewards, VLA-OPD leverages an expert teacher to provide dense, token-level supervision on the student's self-generated trajectories. This enables active error correction on policy-induced states while preserving pre-trained general capabilities through gentle alignment. Crucially, we formulate VLA-OPD via a Reverse-KL objective. Unlike standard Forward-KL that induces mode-covering entropy explosion, or Hard-CE that causes premature entropy collapse, our bounded mode-seeking objective ensures stable policy learning by filtering out the teacher's epistemic uncertainty while maintaining action diversity. Experiments on LIBERO and RoboTwin2.0 benchmarks demonstrate that VLA-OPD significantly improves sample efficiency over RL and robustness over SFT, while effectively mitigating catastrophic forgetting during post-training.

2603.26665 2026-03-30 cs.CV

Detailed Geometry and Appearance from Opportunistic Motion

Ryosuke Hirai, Kohei Yamashita, Antoine Guédon, Ryo Kawahara, Vincent Lepetit, Ko Nishino

详情
英文摘要

Reconstructing 3D geometry and appearance from a sparse set of fixed cameras is a foundational task with broad applications, yet it remains fundamentally constrained by the limited viewpoints. We show that this bound can be broken by exploiting opportunistic object motion: as a person manipulates an object~(e.g., moving a chair or lifting a mug), the static cameras effectively ``orbit'' the object in its local coordinate frame, providing additional virtual viewpoints. Harnessing this object motion, however, poses two challenges: the tight coupling of object pose and geometry estimation and the complex appearance variations of a moving object under static illumination. We address these by formulating a joint pose and shape optimization using 2D Gaussian splatting with alternating minimization of 6DoF trajectories and primitive parameters, and by introducing a novel appearance model that factorizes diffuse and specular components with reflected directional probing within the spherical harmonics space. Extensive experiments on synthetic and real-world datasets with extremely sparse viewpoints demonstrate that our method recovers significantly more accurate geometry and appearance than state-of-the-art baselines.

2603.26664 2026-03-30 cs.SE cs.CL

Learning to Commit: Generating Organic Pull Requests via Online Repository Memory

Mo Li, L. H. Xu, Qitai Tan, Ting Cao, Yunxin Liu

Comments Preprint. Work in progress

详情
英文摘要

Large language model (LLM)-based coding agents achieve impressive results on controlled benchmarks yet routinely produce pull requests that real maintainers reject. The root cause is not functional incorrectness but a lack of organicity: generated code ignores project-specific conventions, duplicates functionality already provided by internal APIs, and violates implicit architectural constraints accumulated over years of development. Simply exposing an agent to the latest repository snapshot is not enough: the snapshot reveals the final state of the codebase, but not the repository-specific change patterns by which that state was reached. We introduce Learning to Commit, a framework that closes this gap through Online Repository Memory. Given a repository with a strict chronological split, the agent performs supervised contrastive reflection on earlier commits: it blindly attempts to resolve each historical issue, compares its prediction against the oracle diff, and distils the gap into a continuously growing set of skills-reusable patterns capturing coding style, internal API usage, and architectural invariants. When a new PR description arrives, the agent conditions its generation on these accumulated skills, producing changes grounded in the project's own evolution rather than generic pretraining priors. Evaluation is conducted on genuinely future, merged pull requests that could not have been seen during the skill-building phase, and spans multiple dimensions including functional correctness, code-style consistency, internal API reuse rate, and modified-region plausibility. Experiments on an expert-maintained repository with rich commit history show that Online Repository Memory effectively improves organicity scores on held-out future tasks.

2603.26663 2026-03-30 cs.CL

Weight Tying Biases Token Embeddings Towards the Output Space

Antonio Lopardo, Avyukth Harish, Catherine Arnett, Akshat Gupta

详情
英文摘要

Weight tying, i.e. sharing parameters between input and output embedding matrices, is common practice in language model design, yet its impact on the learned embedding space remains poorly understood. In this paper, we show that tied embedding matrices align more closely with output (unembedding) matrices than with input embeddings of comparable untied models, indicating that the shared matrix is shaped primarily for output prediction rather than input representation. This unembedding bias arises because output gradients dominate early in training. Using tuned lens analysis, we show this negatively affects early-layer computations, which contribute less effectively to the residual stream. Scaling input gradients during training reduces this bias, providing causal evidence for the role of gradient imbalance. This is mechanistic evidence that weight tying optimizes the embedding matrix for output prediction, compromising its role in input representation. These results help explain why weight tying can harm performance at scale and have implications for training smaller LLMs, where the embedding matrix contributes substantially to total parameter count.

2603.26661 2026-03-30 cs.CV

GaussianGPT: Towards Autoregressive 3D Gaussian Scene Generation

Nicolas von Lützow, Barbara Rössle, Katharina Schmid, Matthias Nießner

Comments Project page: https://nicolasvonluetzow.github.io/GaussianGPT/ - Project video: https://youtu.be/zVnMHkFzHDg

详情
英文摘要

Most recent advances in 3D generative modeling rely on diffusion or flow-matching formulations. We instead explore a fully autoregressive alternative and introduce GaussianGPT, a transformer-based model that directly generates 3D Gaussians via next-token prediction, thus facilitating full 3D scene generation. We first compress Gaussian primitives into a discrete latent grid using a sparse 3D convolutional autoencoder with vector quantization. The resulting tokens are serialized and modeled using a causal transformer with 3D rotary positional embedding, enabling sequential generation of spatial structure and appearance. Unlike diffusion-based methods that refine scenes holistically, our formulation constructs scenes step-by-step, naturally supporting completion, outpainting, controllable sampling via temperature, and flexible generation horizons. This formulation leverages the compositional inductive biases and scalability of autoregressive modeling while operating on explicit representations compatible with modern neural rendering pipelines, positioning autoregressive transformers as a complementary paradigm for controllable and context-aware 3D generation.

2603.26659 2026-03-30 cs.RO

Partial Motion Imitation for Learning Cart Pushing with Legged Manipulators

Mili Das, Morgan Byrd, Donghoon Baek, Sehoon Ha

Comments 8 pages, 5 figures

详情
英文摘要

Loco-manipulation is a key capability for legged robots to perform practical mobile manipulation tasks, such as transporting and pushing objects, in real-world environments. However, learning robust loco-manipulation skills remains challenging due to the difficulty of maintaining stable locomotion while simultaneously performing precise manipulation behaviors. This work proposes a partial imitation learning approach that transfers the locomotion style learned from a locomotion task to cart loco-manipulation. A robust locomotion policy is first trained with extensive domain and terrain randomization, and a loco-manipulation policy is then learned by imitating only lower-body motions using a partial adversarial motion prior. We conduct experiments demonstrating that the learned policy successfully pushes a cart along diverse trajectories in IsaacLab and transfers effectively to MuJoCo. We also compare our method to several baselines and show that the proposed approach achieves more stable and accurate loco-manipulation behaviors.

2603.26658 2026-03-30 cs.CV

Zero-Shot Depth from Defocus

Yiming Zuo, Hongyu Wen, Venkat Subramanian, Patrick Chen, Karhan Kayan, Mario Bijelic, Felix Heide, Jia Deng

详情
英文摘要

Depth from Defocus (DfD) is the task of estimating a dense metric depth map from a focus stack. Unlike previous works overfitting to a certain dataset, this paper focuses on the challenging and practical setting of zero-shot generalization. We first propose a new real-world DfD benchmark ZEDD, which contains 8.3x more scenes and significantly higher quality images and ground-truth depth maps compared to previous benchmarks. We also design a novel network architecture named FOSSA. FOSSA is a Transformer-based architecture with novel designs tailored to the DfD task. The key contribution is a stack attention layer with a focus distance embedding, allowing efficient information exchange across the focus stack. Finally, we develop a new training data pipeline allowing us to utilize existing large-scale RGBD datasets to generate synthetic focus stacks. Experiment results on ZEDD and other benchmarks show a significant improvement over the baselines, reducing errors by up to 55.7%. The ZEDD benchmark is released at https://zedd.cs.princeton.edu. The code and checkpoints are released at https://github.com/princeton-vl/FOSSA.

2603.26657 2026-03-30 cs.CV cs.LG

Tunable Soft Equivariance with Guarantees

Md Ashiqur Rahman, Lim Jun Hao, Jeremiah Jiang, Teck-Yian Lim, Raymond A. Yeh

详情
英文摘要

Equivariance is a fundamental property in computer vision models, yet strict equivariance is rarely satisfied in real-world data, which can limit a model's performance. Controlling the degree of equivariance is therefore desirable. We propose a general framework for constructing soft equivariant models by projecting the model weights into a designed subspace. The method applies to any pre-trained architecture and provides theoretical bounds on the induced equivariance error. Empirically, we demonstrate the effectiveness of our method on multiple pre-trained backbones, including ViT and ResNet, across image classification, semantic segmentation, and human-trajectory prediction tasks. Notably, our approach improves the performance while simultaneously reducing equivariance error on the competitive ImageNet benchmark.

2603.26653 2026-03-30 cs.CV cs.AI cs.CL cs.LG

PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning

Shaoxuan Li, Zhixuan Zhao, Hanze Deng, Zirun Ma, Shulin Tian, Zuyan Liu, Yushi Hu, Haoning Wu, Yuhao Dong, Benlin Liu, Ziwei Liu, Ranjay Krishna

Comments Project Page: https://perceptioncomp.github.io

详情
英文摘要

We introduce PerceptionComp, a manually annotated benchmark for complex, long-horizon, perception-centric video reasoning. PerceptionComp is designed so that no single moment is sufficient: answering each question requires multiple temporally separated pieces of visual evidence and compositional constraints under conjunctive and sequential logic, spanning perceptual subtasks such as objects, attributes, relations, locations, actions, and events, and requiring skills including semantic recognition, visual correspondence, temporal reasoning, and spatial reasoning. The benchmark contains 1,114 highly complex questions on 279 videos from diverse domains including city walk tours, indoor villa tours, video games, and extreme outdoor sports, with 100% manual annotation. Human studies show that PerceptionComp requires substantial test-time thinking and repeated perception steps: participants take much longer than on prior benchmarks, and accuracy drops to near chance (18.97%) when rewatching is disallowed. State-of-the-art MLLMs also perform substantially worse on PerceptionComp than on existing benchmarks: the best model in our evaluation, Gemini-3-Flash, reaches only 45.96% accuracy in the five-choice setting, while open-source models remain below 40%. These results suggest that perception-centric long-horizon video reasoning remains a major bottleneck, and we hope PerceptionComp will help drive progress in perceptual reasoning.

2603.26652 2026-03-30 math.MG cs.CG math.CO math.DG math.GT

Surfaces without quasi-isometric simplicial triangulations

James Davies

Comments 9 pages, 3 figures

详情
英文摘要

We construct a complete Riemannian surface $Σ$ that admits no triangulation $G\subset Σ$ such that the inclusion $G^{(1)} \hookrightarrow Σ$ is a quasi-isometry, where $G^{(1)}$ is the simplicial 1-skeleton of $G$. Our construction is without boundary, has arbitrarily large systole, and furthermore, there is no embedded graph $G\subsetΣ$ such that $G^{(1)} \hookrightarrow Σ$ is a quasi-isometry. This answers a question of Georgakopoulos.

2603.26647 2026-03-30 cs.LG cs.SY eess.SY

An LP-based Sampling Policy for Multi-Armed Bandits with Side-Observations and Stochastic Availability

Ashutosh Soni, Peizhong Ju, Atilla Eryilmaz, Ness B. Shroff

详情
英文摘要

We study the stochastic multi-armed bandit (MAB) problem where an underlying network structure enables side-observations across related actions. We use a bipartite graph to link actions to a set of unknowns, such that selecting an action reveals observations for all the unknowns it is connected to. While previous works rely on the assumption that all actions are permanently accessible, we investigate the more practical setting of stochastic availability, where the set of feasible actions (the "activation set") varies dynamically in each round. This framework models real-world systems with both structural dependencies and volatility, such as social networks where users provide side-information about their peers' preferences, yet are not always online to be queried. To address this challenge, we propose UCB-LP-A, a novel policy that leverages a Linear Programming (LP) approach to optimize exploration-exploitation trade-offs under stochastic availability. Unlike standard network bandit algorithms that assume constant access, UCB-LP-A computes an optimal sampling distribution over the realizable activation sets, ensuring that the necessary observations are gathered using only the currently active arms. We derive a theoretical upper bound on the regret of our policy, characterizing the impact of both the network structure and the activation probabilities. Finally, we demonstrate through numerical simulations that UCB-LP-A significantly outperforms existing heuristics that ignore either the side-information or the availability constraints.

2603.26646 2026-03-30 cs.CV

Beyond Language: Grounding Referring Expressions with Hand Pointing in Egocentric Vision

Ling Li, Bowen Liu, Zinuo Zhan, Peng Jie, Jianhui Zhong, Kenglun Chang, Zhidong Deng

详情
英文摘要

Traditional Visual Grounding (VG) predominantly relies on textual descriptions to localize objects, a paradigm that inherently struggles with linguistic ambiguity and often ignores non-verbal deictic cues prevalent in real-world interactions. In natural egocentric engagements, hand-pointing combined with speech forms the most intuitive referring mechanism. To bridge this gap, we introduce EgoPoint-Ground, the first large-scale multimodal dataset dedicated to egocentric deictic visual grounding. Comprising over \textbf{15k} interactive samples in complex scenes, the dataset provides rich, multi-grained annotations including hand-target bounding box pairs and dense semantic captions. We establish a comprehensive benchmark for hand-pointing referring expression resolution, evaluating a wide spectrum of mainstream Multimodal Large Language Models (MLLMs) and state-of-the-art VG architectures. Furthermore, we propose SV-CoT, a novel baseline framework that reformulates grounding as a structured inference process, synergizing gestural and linguistic cues through a Visual Chain-of-Thought paradigm. Extensive experiments demonstrate that SV-CoT achieves an $\textbf{11.7\%}$ absolute improvement over existing methods, effectively mitigating semantic ambiguity and advancing the capability of agents to comprehend multimodal physical intents. The dataset and code will be made publicly available.

2603.26644 2026-03-30 cs.LG astro-ph.IM stat.ME

Automatic Laplace Collapsed Sampling: Scalable Marginalisation of Latent Parameters via Automatic Differentiation

Toby Lovick, David Yallup, Will Handley

Comments 28 Pages, 7 Figures. Comments welcome

详情
英文摘要

We present Automatic Laplace Collapsed Sampling (ALCS), a general framework for marginalising latent parameters in Bayesian models using automatic differentiation, which we combine with nested sampling to explore the hyperparameter space in a robust and efficient manner. At each nested sampling likelihood evaluation, ALCS collapses the high-dimensional latent variables $z$ to a scalar contribution via maximum a posteriori (MAP) optimisation and a Laplace approximation, both computed using autodiff. This reduces the effective dimension from $d_θ+ d_z$ to just $d_θ$, making Bayesian evidence computation tractable for high-dimensional settings without hand-derived gradients or Hessians, and with minimal model-specific engineering. The MAP optimisation and Hessian evaluation are parallelised across live points on GPU-hardware, making the method practical at scale. We also show that automatic differentiation enables local approximations beyond Laplace to parametric families such as the Student-$t$, which improves evidence estimates for heavy-tailed latents. We validate ALCS on a suite of benchmarks spanning hierarchical, time-series, and discrete-likelihood models and establish where the Gaussian approximation holds. This enables a post-hoc ESS diagnostic that localises failures across hyperparameter space without expensive joint sampling.

2603.26643 2026-03-30 math.NA cs.NA

Boundary neuron method for solving partial differential equations

Ye Lin, Wentao Liu, Young Ju Lee, Jiwei Jia

详情
英文摘要

We propose a boundary neuron method with random features (BNM-RF) for solving partial differential equations. The method approximates the unknown boundary function by a shallow network within the boundary integral formulation. With randomly sampled and fixed hidden parameters, the computation reduces to a linear least squares problem for the output coefficients, which avoids gradient based nonconvex optimization. This construction retains the dimensionality reduction of boundary integral equations and the linear solution structure of the random feature method. For elliptic problems, we establish convergence analysis by combining kernel-based method with random feature approximation, and obtain error bounds on both the boundary and the interior solution. Numerical experiments on Laplace and Helmholtz problems, including interior and exterior cases, show that the proposed method achieves competitive accuracy relative to the boundary element method and favorable performance relative to boundary integral neural networks in the tested settings with only few neurons. Overall, the proposed method provides a practical framework for combining boundary integral equations with neural network for problems on complex geometries and unbounded domains.

2603.26639 2026-03-30 cs.CV cs.AI

Make Geometry Matter for Spatial Reasoning

Shihua Zhang, Qiuhong Shen, Shizun Wang, Tianbo Pan, Xinchao Wang

详情
英文摘要

Empowered by large-scale training, vision-language models (VLMs) achieve strong image and video understanding, yet their ability to perform spatial reasoning in both static scenes and dynamic videos remains limited. Recent advances try to handle this limitation by injecting geometry tokens from pretrained 3D foundation models into VLMs. Nevertheless, we observe that naive token fusion followed by standard fine-tuning in this line of work often leaves such geometric cues underutilized for spatial reasoning, as VLMs tend to rely heavily on 2D visual cues. In this paper, we propose GeoSR, a framework designed to make geometry matter by encouraging VLMs to actively reason with geometry tokens. GeoSR introduces two key components: (1) Geometry-Unleashing Masking, which strategically masks portions of 2D vision tokens during training to weaken non-geometric shortcuts and force the model to consult geometry tokens for spatial reasoning; and (2) Geometry-Guided Fusion, a gated routing mechanism that adaptively amplifies geometry token contributions in regions where geometric evidence is critical. Together, these designs unleash the potential of geometry tokens for spatial reasoning tasks. Extensive experiments on both static and dynamic spatial reasoning benchmarks demonstrate that GeoSR consistently outperforms prior methods and establishes new state-of-the-art performance by effectively leveraging geometric information. The project page is available at https://suhzhang.github.io/GeoSR/.

2603.26638 2026-03-30 cs.CV cs.RO

Drive-Through 3D Vehicle Exterior Reconstruction via Dynamic-Scene SfM and Distortion-Aware Gaussian Splatting

Nitin Kulkarni, Akhil Devarashetti, Charlie Cluss, Livio Forte, Philip Schneider, Chunming Qiao, Alina Vereshchaka

Comments 8 pages, 7 figures, Submitted to IEEE IROS 2026 (under review)

详情
英文摘要

High-fidelity 3D reconstruction of vehicle exteriors improves buyer confidence in online automotive marketplaces, but generating these models in cluttered dealership drive-throughs presents severe technical challenges. Unlike static-scene photogrammetry, this setting features a dynamic vehicle moving against heavily cluttered, static backgrounds. This problem is further compounded by wide-angle lens distortion, specular automotive paint, and non-rigid wheel rotations that violate classical epipolar constraints. We propose an end-to-end pipeline utilizing a two-pillar camera rig. First, we resolve dynamic-scene ambiguities by coupling SAM 3 for instance segmentation with motion-gating to cleanly isolate the moving vehicle, explicitly masking out non-rigid wheels to enforce strict epipolar geometry. Second, we extract robust correspondences directly on raw, distorted 4K imagery using the RoMa v2 learned matcher guided by semantic confidence masks. Third, these matches are integrated into a rig-aware SfM optimization that utilizes CAD-derived relative pose priors to eliminate scale drift. Finally, we use a distortion-aware 3D Gaussian Splatting framework (3DGUT) coupled with a stochastic Markov Chain Monte Carlo (MCMC) densification strategy to render reflective surfaces. Evaluations on 25 real-world vehicles across 10 dealerships demonstrate that our full pipeline achieves a PSNR of 28.66 dB, an SSIM of 0.89, and an LPIPS of 0.21 on held-out views, representing a 3.85 dB improvement over standard 3D-GS, delivering inspection-grade interactive 3D models without controlled studio infrastructure.

2603.26637 2026-03-30 cs.AR

Who Checks the Checker? Enhancing Component-level Architectural SEU Fault Tolerance for End-to-End SoC Protection

Michael Rogenmoser, Philippe Sauter, Chen Wu, Angelo Garofalo, Luca Benini

Comments 7 pages, accepted at VLSI Test Symposium 2026 (VTS 2026)

详情
英文摘要

Single-event upset (SEU) fault tolerance for systems-on-chip (SoCs) in radiation-heavy environments is often addressed by architectural fault-tolerance approaches protecting individual SoC components (e.g., cores, memories) in isolation. However, the protection of voting logic and interconnections among components is also critical, as these become single points of failure in the design. We investigate combining multiple fault-tolerance approaches targeting individual SoC components, including interconnect and voting logic to ensure end-to-end SoC-level architectural SEU fault tolerance, while minimizing implementation area overheads. Enforcing an overlap between the protection methods ensures hardening of the whole design without gaps, while curtailing overheads. We demonstrate our approach on a RISC-V microcontroller SoC. SEU fault-tolerance is assessed with simulation-based fault injection. Overheads are assessed with full physical implementation. Tolerance to over 99.9% of faults in both RTL and implemented netlist is demonstrated. Furthermore, the design exhibits 22% lower implementation overhead compared to a single global fault-tolerance method, such as fine-grained triplication.

2603.26636 2026-03-30 physics.app-ph cs.SY eess.SY

Patched-Wall Quasistatic Cavity Resonators for 3-D Wireless Power Transfer

Takuya Sasatani, Yoshihiro Kawahara

Comments 5 pages, 6 figures

详情
英文摘要

Traditional wireless power transfer (WPT) systems are largely limited to 1-D charging pads or 2-D charging surfaces and therefore do not support a truly ubiquitous device-powering experience. Although room-scale WPT based on multimode quasistatic cavity resonance (QSCR) has demonstrated full-volume coverage by leveraging multiple resonant modes, existing high-coverage implementations require obstructive internal conductive structures, such as a central pole. This letter presents a new structure, termed the patched-wall QSCR, that eliminates such internal obstructions while preserving full-volume coverage. By using conductive wall segments interconnected by capacitors, the proposed structure supports two complementary resonant modes that cover both the peripheral and central regions without obstructions within the charging volume. Electromagnetic simulations show that, by selectively exciting these two resonant modes, the proposed structure achieves a minimum power-transfer efficiency of 48.1% across the evaluated 54 m^3 charging volume while preserving an unobstructed interior space.

2603.26635 2026-03-30 cs.MA

Deception and Communication in Autonomous Multi-Agent Systems: An Experimental Study with Among Us

Maria Milkowski, Tim Weninger

Comments 8 pages + references, 9 figures. Accepted at AAMAS 2026

详情
Journal ref
Proceedings of the 25th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2026), IFAAMAS, 2026
英文摘要

As large language models are deployed as autonomous agents, their capacity for strategic deception raises core questions for coordination, reliability, and safety in multi-goal, multi-agent systems. We study deception and communication in L2LM agents through the social deduction game Among Us, a cooperative-competitive environment. Across 1,100 games, autonomous agents produced over one million tokens of meeting dialogue. Using speech act theory and interpersonal deception theory, we find that all agents rely mainly on directive language, while impostor agents shift slightly toward representative acts such as explanations and denials. Deception appears primarily as equivocation rather than outright lies, increasing under social pressure but rarely improving win rates. Our contributions are a large-scale analysis of role-conditioned deceptive behavior in LLM agents and empirical evidence that current agents favor low-risk ambiguity that is linguistically subtle yet strategically limited, revealing a fundamental tension between truthfulness and utility in autonomous communication.

2603.26632 2026-03-30 cs.CR cs.AI cs.LG

Machine Learning Transferability for Malware Detection

César Vieira, João Vitorino, Eva Maia, Isabel Praça

Comments 12 pages, 1 Figure, 2 tables, World CIST 2026

详情
英文摘要

Malware continues to be a predominant operational risk for organizations, especially when obfuscation techniques are used to evade detection. Despite the ongoing efforts in the development of Machine Learning (ML) detection approaches, there is still a lack of feature compatibility in public datasets. This limits generalization when facing distribution shifts, as well as transferability to different datasets. This study evaluates the suitability of different data preprocessing approaches for the detection of Portable Executable (PE) files with ML models. The preprocessing pipeline unifies EMBERv2 (2,381-dim) features datasets, trains paired models under two training setups: EMBER + BODMAS and EMBER + BODMAS + ERMDS. Regarding model evaluation, both EMBER + BODMAS and EMBER + BODMAS + ERMDS models are tested against TRITIUM, INFERNO and SOREL-20M. ERMDS is also used for testing for the EMBER + BODMAS setup.

2603.26631 2026-03-30 cs.GT cs.SI

Learning From Social Interactions: Personalized Pricing and Buyer Manipulation

Qinqi Lin, Lingjie Duan, Jianwei Huang

Comments Published in IEEE Transactions on Mobile Computing (a complete version with supplementary materials included)

详情
Journal ref
IEEE Transactions on Mobile Computing, vol. 23, no. 12, pp. 11871-11888, Dec. 2024
英文摘要

As the sociological theory of homophily suggests, people tend to interact with those of similar preferences. Motivated by this well-established phenomenon, today's online sellers, such as Amazon,~seek~to learn a new buyer's private preference from his friends' purchase records. Although such learning allows the seller to enable personalized pricing and boost revenue, buyers are also increasingly aware of these practices and may alter their social behaviors accordingly. This paper presents the first study regarding how buyers strategically manipulate their social interaction signals considering their preference correlations, and how a seller can take buyers' strategic social behaviors into consideration when designing the pricing scheme. Starting with the fundamental two-buyer network, we propose and analyze a parsimonious model that uniquely captures the double-layered information asymmetry between the seller and buyers, integrating both individual buyer information and inter-buyer correlation information. Our analysis reveals that only high-preference buyers tend to manipulate their social interactions to evade the seller's personalized pricing, but surprisingly, their payoffs may actually worsen as a result. Moreover, we demonstrate that the seller can considerably benefit from the learning practice, regardless of whether the buyers are aware of this fact or not. Indeed, our analysis reveals that buyers' learning-aware strategic manipulation has only a slight impact on the seller's revenue. In light of the tightening regulatory policies concerning data access, it is advisable for sellers to maintain transparency with buyers regarding their access to buyers' social interaction data for learning purposes. This finding aligns well with current informed-consent industry practices for data sharing.

2603.26629 2026-03-30 cs.LG

Context-specific Credibility-aware Multimodal Fusion with Conditional Probabilistic Circuits

Pranuthi Tenali, Sahil Sidheekh, Saurabh Mathur, Erik Blasch, Kristian Kersting, Sriraam Natarajan

详情
英文摘要

Multimodal fusion requires integrating information from multiple sources that may conflict depending on context. Existing fusion approaches typically rely on static assumptions about source reliability, limiting their ability to resolve conflicts when a modality becomes unreliable due to situational factors such as sensor degradation or class-specific corruption. We introduce C$^2$MF, a context-specfic credibility-aware multimodal fusion framework that models per-instance source reliability using a Conditional Probabilistic Circuit (CPC). We formalize instance-level reliability through Context-Specific Information Credibility (CSIC), a KL-divergence-based measure computed exactly from the CPC. CSIC generalizes conventional static credibility estimates as a special case, enabling principled and adaptive reliability assessment. To evaluate robustness under cross-modal conflicts, we propose the Conflict benchmark, in which class-specific corruptions deliberately induce discrepancies between different modalities. Experimental results show that C$^2$MF improves predictive accuracy by up to 29% over static-reliability baselines in high-noise settings, while preserving the interpretability advantages of probabilistic circuit-based fusion.

2603.26628 2026-03-30 cs.IT math.IT

USAM: A Unified Safety-Age metric for Timeliness in Heterogeneous IoT Systems

Mikael Gidlund

详情
英文摘要

Massive Internet-of-Things (IoT) deployments must simultaneously support monitoring, control, and safety-critical communication over shared wireless infrastructure. Classical timeliness metrics, such as Age of Information and its variants, quantify the freshness of received updates but do not account for deterministic safety timing requirements that arise in cyber-physical systems. Consequently, freshness-oriented metrics may indicate satisfactory performance even when worst-case timing guarantees required by functional safety standards are violated. This paper introduces the Unified Safety--Age Metric (USAM), a safety-aware timeliness metric that integrates information freshness, deadline reliability, and deterministic response-time feasibility into a single architecture-aware performance measure. We consider heterogeneous IoT traffic served by a gateway with intermittent receiver readiness and analyze system behavior in the ultra-sparse regime typical of massive machine-type communications. The analysis shows that, as device activity decreases, queueing delays become negligible and system timeliness becomes dominated by infrastructure readiness and deterministic response-time constraints. In this regime, feasibility is determined primarily by the receiver duty cycle rather than by average traffic load. Numerical results illustrate the safety-blindness of classical freshness metrics and demonstrate that USAM explicitly captures the feasibility boundary imposed by heterogeneous traffic requirements. The proposed framework provides a foundation for analyzing safety-aware communication architectures in large-scale IoT systems.

2603.26621 2026-03-30 eess.SY cs.SY

Inclusion conditions for the Constrained Polynomial Zonotopic case

Bogdan Gheorghe, Amr Alanwar, Florin Stoican

详情
英文摘要

Set operations are well understood for convex sets but become considerably more challenging in the non-convex case due to the loss of structural properties in their representation. Constrained polynomial zonotopes (CPZs) offer an effective compromise, as they can capture complex, typically non-convex geometries while maintaining an algebraic structure suitable for further manipulation. Building on this, we propose novel nonlinear encodings that provide sufficient conditions for testing inclusion between two CPZs and adapt them for seamless integration within optimization frameworks.

2603.26614 2026-03-30 cs.IT math.IT

Function-Based Minimal Linear Codes over Galois Rings $\mathrm{GR}(p^{n}, \ell)$: Minimality Criteria and Infinite Constructions

Biplab Chatterjee, Sihem Mesnager, Ratnesh Kumar Mishra, Makhan Maji, Kalyan Hansda

详情
英文摘要

In this paper, we extend a necessary and sufficient condition for a linear code over a Galois ring to be minimal and establish new bounds on the length of an $m$-dimensional minimal linear code. Building upon this structural characterization, we further generalize the function-based minimality criteria introduced by Wu \emph{et al.} (Cryptogr. Commun. 14, 875-895, 2022) from the finite field setting to the framework of Galois rings. The transition from fields to rings introduces substantial algebraic challenges due to the presence of zero divisors and the richer module structure of $\mathrm{GR}(p^{n},\ell)$. By exploiting Frobenius duality and the chain structure of Galois rings, we derive refined necessary and sufficient conditions ensuring that linear codes arising from functions over $\mathrm{GR}(p^{n},\ell)$ are minimal. As an application of these criteria, we construct several infinite families of minimal linear codes over Galois rings, thereby significantly generalizing the constructions of Wu \emph{et al.} to the ring setting. Our results provide a unified framework that connects minimality theory, module duality over Frobenius rings, and function-based code constructions.

2603.26611 2026-03-30 cs.LG stat.ME stat.ML

Benchmarking Tabular Foundation Models for Conditional Density Estimation in Regression

Rafael Izbicki, Pedro L. C. Rodrigues

详情
英文摘要

Conditional density estimation (CDE) - recovering the full conditional distribution of a response given tabular covariates - is essential in settings with heteroscedasticity, multimodality, or asymmetric uncertainty. Recent tabular foundation models, such as TabPFN and TabICL, naturally produce predictive distributions, but their effectiveness as general-purpose CDE methods has not been systematically evaluated, unlike their performance for point prediction, which is well studied. We benchmark three tabular foundation model variants against a diverse set of parametric, tree-based, and neural CDE baselines on 39 real-world datasets, across training sizes from 50 to 20,000, using six metrics covering density accuracy, calibration, and computation time. Across all sample sizes, foundation models achieve the best CDE loss, log-likelihood, and CRPS on the large majority of datasets tested. Calibration is competitive at small sample sizes but, for some metrics and datasets, lags behind task-specific neural baselines at larger sample sizes, suggesting that post-hoc recalibration may be a valuable complement. In a photometric redshift case study using SDSS DR18, TabPFN exposed to 50,000 training galaxies outperforms all baselines trained on the full 500,000-galaxy dataset. Taken together, these results establish tabular foundation models as strong off-the-shelf conditional density estimators.

2603.26610 2026-03-30 cs.CV cs.AI

Think over Trajectories: Leveraging Video Generation to Reconstruct GPS Trajectories from Cellular Signaling

Ruixing Zhang, Hanzhang Jiang, Leilei Sun, Liangzhe Han, Jibin Wang, Weifeng Lv

详情
英文摘要

Mobile devices continuously interact with cellular base stations, generating massive volumes of signaling records that provide broad coverage for understanding human mobility. However, such records offer only coarse location cues (e.g., serving-cell identifiers) and therefore limit their direct use in applications that require high-precision GPS trajectories. This paper studies the Sig2GPS problem: reconstructing GPS trajectories from cellular signaling. Inspired by domain experts often lay the signaling trace on the map and sketch the corresponding GPS route, unlike conventional solutions that rely on complex multi-stage engineering pipelines or regress coordinates, Sig2GPS is reframed as an image-to-video generation task that directly operates in the map-visual domain: signaling traces are rendered on a map, and a video generation model is trained to draw a continuous GPS path. To support this paradigm, a paired signaling-to-trajectory video dataset is constructed to fine-tune an open-source video model, and a trajectory-aware reinforcement learning-based optimization method is introduced to improve generation fidelity via rewards. Experiments on large-scale real-world datasets show substantial improvements over strong engineered and learning-based baselines, while additional results on next GPS prediction indicate scalability and cross-city transferability. Overall, these results suggest that map-visual video generation provides a practical interface for trajectory data mining by enabling direct generation and refinement of continuous paths under map constraints.

2603.26608 2026-03-30 cs.HC cs.ET

Sticky and Magnetic: Evaluating Error Correction and User Adaptation in Gaze and Pinch Interaction

Jazmin Collins, Prasanthi Gurumurthy, Eric J. Gonzalez, Mar Gonzalez-Franco

Comments 5 page, 5 figures, Extended Abstracts of the 2026 CHI Conference on Human Factors in Computing Systems (CHI '26), April 13-17, 2026, Barcelona, Spain. ACM

详情
Journal ref
In Extended Abstracts of the 2026 CHI Conference on Human Factors in Computing Systems (CHI EA '26). 2026
英文摘要

The gaze-and-pinch framework offers a high-fidelity interaction modality for spatial computing in virtual reality (VR), yet it remains vulnerable to coordination errors--timing misalignments between gaze fixation and pinch gestures. These errors are categorized into two types: late triggers (gaze leaves a target before pinch) and early triggers (pinch before gaze arrival on target). While late triggers are well-studied, early triggers lack robust solutions. We investigate two heuristics--STICKY selection (temporal buffer) and MAGNETIC selection (spatial field)--to mitigate these errors. A within-subjects study (N = 9) on the Samsung Galaxy XR evaluated these heuristics against a baseline. Findings indicate that while throughput and selection time remained stable, the heuristics fundamentally shifted user behavior and significantly reduced errors during selection. Notably, MAGNETIC selection induced an "offloading" effect where users traded precision for speed. Additionally, the heuristics reclassified ambiguous failures as explainable coordination errors. We provide recommendations for selection heuristics that enhance interaction speed and cognitive agency in virtual reality.

2603.26604 2026-03-30 cs.LG hep-ph physics.ins-det

Hardware-Aware Tensor Networks for Real-Time Quantum-Inspired Anomaly Detection at Particle Colliders

Sagar Addepalli, Prajita Bhattarai, Abhilasha Dave, Julia Gonski

Comments 28 pages, 9 figures

详情
英文摘要

Quantum machine learning offers the ability to capture complex correlations in high-dimensional feature spaces, crucial for the challenge of detecting beyond the Standard Model physics in collider events, along with the potential for unprecedented computational efficiency in future quantum processors. Near-term utilization of these benefits can be achieved by developing quantum-inspired algorithms for deployment in classical hardware to enable applications at the "edge" of current scientific experiments. This work demonstrates the use of tensor networks for real-time anomaly detection in collider detectors. A spaced matrix product operator (SMPO) is developed that provides sensitivity to a variety beyond the Standard Model benchmarks, and can be implemented in field programmable gate array hardware with resources and latency consistent with trigger deployment. The cascaded SMPO architecture is introduced as an SMPO variation that affords greater flexibility and efficiency in ways that are key to edge applications in resource-constrained environments. These results reveal the benefit and near-term feasibility of deploying quantum-inspired ML in high energy colliders.

2603.26599 2026-03-30 cs.CV

VGGRPO: Towards World-Consistent Video Generation with 4D Latent Reward

Zhaochong An, Orest Kupyn, Théo Uscidda, Andrea Colaco, Karan Ahuja, Serge Belongie, Mar Gonzalez-Franco, Marta Tintore Gazulla

Comments Project Page: https://zhaochongan.github.io/projects/VGGRPO

详情
英文摘要

Large-scale video diffusion models achieve impressive visual quality, yet often fail to preserve geometric consistency. Prior approaches improve consistency either by augmenting the generator with additional modules or applying geometry-aware alignment. However, architectural modifications can compromise the generalization of internet-scale pretrained models, while existing alignment methods are limited to static scenes and rely on RGB-space rewards that require repeated VAE decoding, incurring substantial compute overhead and failing to generalize to highly dynamic real-world scenes. To preserve the pretrained capacity while improving geometric consistency, we propose VGGRPO (Visual Geometry GRPO), a latent geometry-guided framework for geometry-aware video post-training. VGGRPO introduces a Latent Geometry Model (LGM) that stitches video diffusion latents to geometry foundation models, enabling direct decoding of scene geometry from the latent space. By constructing LGM from a geometry model with 4D reconstruction capability, VGGRPO naturally extends to dynamic scenes, overcoming the static-scene limitations of prior methods. Building on this, we perform latent-space Group Relative Policy Optimization with two complementary rewards: a camera motion smoothness reward that penalizes jittery trajectories, and a geometry reprojection consistency reward that enforces cross-view geometric coherence. Experiments on both static and dynamic benchmarks show that VGGRPO improves camera stability, geometry consistency, and overall quality while eliminating costly VAE decoding, making latent-space geometry-guided reinforcement an efficient and flexible approach to world-consistent video generation.