arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.06667 2026-05-08 cs.CV cs.AI cs.LG

ActCam: Zero-Shot Joint Camera and 3D Motion Control for Video Generation

Omar El Khalifi, Thomas Rossi, Oscar Fossey, Thibault Fouque, Ulysse Mizrahi, Philip Torr, Ivan Laptev, Fabio Pizzati, Baptiste Bellot-Gurlet

Comments SIGGRAPH 2026

详情

英文摘要

For artistic applications, video generation requires fine-grained control over both performance and cinematography, i.e., the actor's motion and the camera trajectory. We present ActCam, a zero-shot method for video generation that jointly transfers character motion from a driving video into a new scene and enables per-frame control of intrinsic and extrinsic camera parameters. ActCam builds on any pretrained image-to-video diffusion model that accepts conditioning in terms of scene depth and character pose. Given a source video with a moving character and a target camera motion, ActCam generates pose and depth conditions that remain geometrically consistent across frames. We then run a single sampling process with a two-phase conditioning schedule: early denoising steps condition on both pose and sparse depth to enforce scene structure, after which depth is dropped and pose-only guidance refines high-frequency details without over-constraining the generation. We evaluate ActCam on multiple benchmarks spanning diverse character motions and challenging viewpoint changes. We find that, compared to pose-only control and other pose and camera methods, ActCam improves camera adherence and motion fidelity, and is preferred in human evaluations, especially under large viewpoint changes. Our results highlight that careful camera-consistent conditioning and staged guidance can enable strong joint camera and motion control without training. Project page: https://elkhomar.github.io/actcam/.

URL PDF HTML ☆

赞 0 踩 0

2605.06665 2026-05-08 cs.LG cs.AI

UniPool: A Globally Shared Expert Pool for Mixture-of-Experts

Minbin Huang, Han Shi, Chuanyang Zheng, Yimeng Wu, Guoxuan Chen, Xintong Yu, Yichun Yin, Hong Cheng

详情

英文摘要

Modern Mixture-of-Experts (MoE) architectures allocate expert capacity through a rigid per-layer rule: each transformer layer owns a separate expert set. This convention couples depth scaling with linear expert-parameter growth and assumes that every layer needs isolated expert capacity. However, recent analyses and our routing probe challenge this allocation rule: replacing a deeper layer's learned top-k router with uniform random routing drops downstream accuracy by only 1.0-1.6 points across multiple production MoE models. Motivated by this redundancy, we propose UniPool, an MoE architecture that treats expert capacity as a global architectural budget by replacing per-layer expert ownership with a single shared pool accessed by independent per-layer routers. To enable stable and balanced training under sharing, we introduce a pool-level auxiliary loss that balances expert utilization across the entire pool, and adopt NormRouter to provide sparse and scale-stable routing into the shared expert pool. Across five LLaMA-architecture model scales (182M, 469M, 650M, 830M, and 978M parameters) trained on 30B tokens from the Pile, UniPool consistently improves validation loss and perplexity over the matched vanilla MoE baselines. Across these scales, UniPool reduces validation loss by up to 0.0386 relative to vanilla MoE. Beyond raw loss improvement, our results identify pool size as an explicit depth-scaling hyperparameter: reduced-pool UniPool variants using only 41.6%-66.7% of the vanilla expert-parameter budget match or outperform layer-wise MoE at the tested scales. This shows that, under a shared-pool design, expert parameters need not grow linearly with depth; they can grow sublinearly while remaining more efficient and effective than vanilla MoE. Further analysis shows that UniPool's benefits compose with finer-grained expert decomposition.

URL PDF HTML ☆

赞 0 踩 0

2605.06664 2026-05-08 cs.CV cs.AI

BAMI: Training-Free Bias Mitigation in GUI Grounding

Borui Zhang, Bo Zhang, Bo Wang, Wenzhao Zheng, Yuhao Cheng, Liang Tang, Yiqiang Yan, Jie Zhou, Jiwen Lu

Comments Accepted by CVPR 2026

2605.06662 2026-05-08 cs.RO

Multi-Robot Coordination in V2X Environments

John Pravin Arockiasamy, Alexey Vinel

Comments Accepted for publication at the IEEE Intelligent Transportation Systems Conference (ITSC), 2026

2605.06660 2026-05-08 cs.LG cs.AI cs.CL

Verifier-Backed Hard Problem Generation for Mathematical Reasoning

Yuhang Lai, Jiazhan Feng, Yee Whye Teh, Ning Miao

2605.06658 2026-05-08 cs.CV

Relit-LiVE: Relight Video by Jointly Learning Environment Video

Weiqing Xiao, Hong Li, Xiuyu Yang, Houyuan Chen, Wenyi Li, Tianqi Liu, Shaocong Xu, Chongjie Ye, Hao Zhao, Beibei Wang

Comments Accepted at SIGGRAPH 2026. Project site: https://github.com/zhuxing0/Relit-LiVE

2605.06656 2026-05-08 cs.LG cs.DM cs.ET math.OC

Why Global LLM Leaderboards Are Misleading: Small Portfolios for Heterogeneous Supervised ML

Jai Moondra, Ayela Chughtai, Bhargavi Lanka, Swati Gupta

详情

英文摘要

Ranking LLMs via pairwise human feedback underpins current leaderboards for open-ended tasks, such as creative writing and problem-solving. We analyze ~89K comparisons in 116 languages from 52 LLMs from Arena, and show that the best-fit global Bradley-Terry (BT) ranking is misleading. Nearly 2/3 of the decisive votes cancel out, and even the top 50 models according to the global BT ranking are statistically indistinguishable (pairwise win probabilities are at most 0.53 within the top 50 models). We trace this failure to strong, structured heterogeneity of opinions across language, task, and time. Moreover, we find an important characteristic - *language* plays a key role. Grouping by language (and families) increases the agreement of votes massively, resulting in two orders of magnitude higher spread in the ELO scores (i.e., very consistent rankings). What appears as global noise is in fact a mixture of coherent but conflicting subpopulations. To address such heterogeneity in supervised machine learning, we introduce the framework of $(λ, ν)$-portfolios, which are small sets of models that achieve a prediction error at most $λ$, "covering" at least a $ν$ fraction of users. We formulate this as a variant of the set cover problem and provide guarantees using the VC dimension of the underlying set system. On the Arena data, our algorithms recover just 5 distinct BT rankings that cover over 96% of votes at a modest $λ$, compared to the 21% coverage by the global ranking. We also provide a portfolio of 6 LLMs that cover twice as many votes as the top-6 LLMs from a global ranking. We further construct portfolios for a classification problem on the COMPAS dataset using an ensemble of fairness-regularized classification models and show that these portfolios can be used to detect blind spots in the data, which might be of independent interest to policymakers.

URL PDF HTML ☆

赞 0 踩 0

2605.06654 2026-05-08 cs.LG cs.AI math.OC

Optimizer-Model Consistency: Full Finetuning with the Same Optimizer as Pretraining Forgets Less

Yuxing Liu, Jianyu Wang, Tong Zhang

2605.06652 2026-05-08 cs.LG cs.AI cs.CL

When No Benchmark Exists: Validating Comparative LLM Safety Scoring Without Ground-Truth Labels

Sushant Gautam, Finn Schwall, Annika Willoch Olstad, Fernando Vallecillos Ruiz, Birk Torpmann-Hagen, Sunniva Maria Stordal Bjørklund, Leon Moonen, Klas Pettersen, Michael A. Riegler

Comments SimpleAudit Repository: https://github.com/kelkalot/simpleaudit

2605.06650 2026-05-08 cs.CL

Beyond Negative Rollouts: Positive-Only Policy Optimization with Implicit Negative Gradients

Mingwei Xu, Hao Fang

详情

英文摘要

Reinforcement learning with verifiable rewards (RLVR), due to the deterministic verification, becomes a dominant paradigm for enhancing the reasoning ability of large language models (LLMs). The community witnesses the rapid change from the Proximal Policy Optimization (PPO) to Group Relative Policy Optimization (GRPO), in which GRPO reduces the complicated advantage estimation with simple estimation over grouped positive and negative rollouts. However, we note that negative rollouts may admit no gradation of failure severity, and the combinatorial vastness makes penalizing a few sampled negatives unlikely to cover a meaningful reward signal under sparse binary rewards. In this work, we propose Positive-Only Policy Optimization (POPO), a novel RLVR framework in which learning can occur exclusively via online positive rollouts. Specifically, POPO utilizes bounded importance sampling over the positive rollout set. Thus, no disjoint negative rollouts are used for the gradient guidance. We show that implicit negative gradients can emerge naturally through reinforcing the positive probability via rollouts redistribution. Next, POPO stabilizes the policy optimization through two mechanisms. First, it applies a siamese policy network with a momentum-based adaptation law for stabilized policy evolution. Second, we replace the KL-divergence with a bounded similarity penalty term in the siamese representation space. We conduct extensive experiments using publicly available, well-established text-LLM models, e.g., the Qwen family, across all-level mathematical benchmarks. Our experiment demonstrates that POPO achieves performance comparable to, or even superior to GRPO. Notably, we show that POPO can achieve 36.67% in AIME 2025 with Qwen-Math-7B, outperforming GRPO 30.00%. Our ablation and sweep studies further illustrate the necessity and robustness of POPO components.

URL PDF HTML ☆

赞 0 踩 0

2605.06646 2026-05-08 cs.LG

Inductive Venn-Abers and related regressors

Ivan Petej, Vladimir Vovk

Comments 33 pages

2605.06643 2026-05-08 cs.CV cs.AI cs.LG cs.MM

Are We Making Progress in Multimodal Domain Generalization? A Comprehensive Benchmark Study

Hao Dong, Hongzhao Li, Shupan Li, Muhammad Haris Khan, Eleni Chatzi, Olga Fink

Comments Code: https://github.com/lihongzhao99/MMDG_Benchmark

详情

英文摘要

Despite the growing popularity of Multimodal Domain Generalization (MMDG) for enhancing model robustness, it remains unclear whether reported performance gains reflect genuine algorithmic progress or are artifacts of inconsistent evaluation protocols. Current research is fragmented, with studies varying significantly across datasets, modality configurations, and experimental settings. Furthermore, existing benchmarks focus predominantly on action recognition, often neglecting critical real-world challenges such as input corruptions, missing modalities, and model trustworthiness. This lack of standardization obscures a reliable assessment of the field's advancement. To address this issue, we introduce MMDG-Bench, the first unified and comprehensive benchmark for MMDG, which standardizes evaluation across six datasets spanning three diverse tasks: action recognition, mechanical fault diagnosis, and sentiment analysis. MMDG-Bench encompasses six modality combinations, nine representative methods, and multiple evaluation settings. Beyond standard accuracy, it systematically assesses corruption robustness, missing-modality generalization, misclassification detection, and out-of-distribution detection. With 7, 402 neural networks trained in total across 95 unique cross-domain tasks, MMDG-Bench yields five key findings: (1) under fair comparisons, recent specialized MMDG methods offer only marginal improvements over ERM baseline; (2) no single method consistently outperforms others across datasets or modality combinations; (3) a substantial gap to upper-bound performance persists, indicating that MMDG remains far from solved; (4) trimodal fusion does not consistently outperform the strongest bimodal configurations; and (5) all evaluated methods exhibit significant degradation under corruption and missing-modality scenarios, with some methods further compromising model trustworthiness.

URL PDF HTML ☆

赞 0 踩 0

2605.06642 2026-05-08 cs.CL cs.AI

StraTA: Incentivizing Agentic Reinforcement Learning with Strategic Trajectory Abstraction

Xiangyuan Xue, Yifan Zhou, Zidong Wang, Shengji Tang, Philip Torr, Wanli Ouyang, Lei Bai, Zhenfei Yin

Comments 26 pages, 4 figures, 7 tables

2605.06641 2026-05-08 cs.AI cs.CV

GlazyBench: A Benchmark for Ceramic Glaze Property Prediction and Image Generation

Ziyu Zhai, Siyou Li, Juexi Shao, Juntao Yu

2605.06640 2026-05-08 cs.LG cs.AI

Concept-Based Abductive and Contrastive Explanations for Behaviors of Vision Models

Ronaldo Canizales, Divya Gopinath, Corina Păsăreanu, Ravi Mangal

2605.06639 2026-05-08 cs.LG cs.AI cs.CL cs.MA

Recursive Agent Optimization

Apurva Gandhi, Satyaki Chakraborty, Xiangjun Wang, Aviral Kumar, Graham Neubig

2605.06635 2026-05-08 cs.CL

Cited but Not Verified: Parsing and Evaluating Source Attribution in LLM Deep Research Agents

Hailey Onweller, Elias Lumer, Austin Huber, Pia Ramchandani, Vamse Kumar Subbiah, Corey Feld

2605.06632 2026-05-08 cs.LG

Crafting Reversible SFT Behaviors in Large Language Models

Yuping Lin, Pengfei He, Yue Xing, Yingqian Cui, Jiayuan Ding, Subhabrata Mukherjee, Hui Liu, Zhen Xiang

2605.06629 2026-05-08 cs.LG

Hybrid Quantum-Classical GANs for the Generation of Adversarial Network Flows

Prateek Paudel, Nitin Jha, Abhishek Parakh, Mahadevan Subramaniam

Comments 14 pages

详情

英文摘要

Classical generative adversarial networks (GANs) have been applied to generate adversarial network traffic capable of attacking intrusion detection systems, but they suffer from shortcomings such as the need for large amounts of high-dimensional datasets, mode collapse, and high computational overhead. In this work, we propose a hybrid quantum-classical GAN (QC-GAN) framework where a variational quantum generator is used to generate synthetic network traffic flows mimicking malicious traffic using latent representations. Instead of sampling classical noise vectors, we encode the latent vector (the hidden features) as a quantum state, which is the basis for claiming more expressive latent representations and reducing computational overhead. A classical discriminator will be trained on real-world datasets (UNSW-NB15) and the proposed QC-GAN-generated fake network flows. In this configuration, the generator aims to minimize the discriminator's ability to distinguish real from fake traffic, while the discriminator aims to maximize its classification accuracy, in an iterative manner. In our attack model, we assume that the attacker is a state actor with access to limited quantum computing power, whereas the discriminator is chosen to be classical, as will likely be the case for most end users and organizations. We test the generated flows using classical intrusion detection system (IDS) models, such as a random forest classifier and a convolutional neural network-based classifier, for their ability to bypass the detection process. This work aims to highlight the possibilities of quantum machine learning as a means of generating advanced attack flows and stress testing classical IDS. Lastly, we further evaluate how hardware-based noise affects these attacks to offer a new perspective on IDS, highlighting the need for a quantum resilient defense system.

URL PDF HTML ☆

赞 0 踩 0

2605.06627 2026-05-08 cs.SD cs.LG

PianoCoRe: Combined and Refined Piano MIDI Dataset

Ilya Borovik

Comments Published in TISMIR. Project repository: https://github.com/ilya16/PianoCoRe

详情

DOI: 10.5334/tismir.333
Journal ref: Transactions of the International Society for Music Information Retrieval, 9(1), 144-163, 2026

英文摘要

Symbolic music datasets with matched scores and performances are essential for many music information retrieval (MIR) tasks. Yet, existing resources often cover a narrow range of composers, lack performance variety, omit note-level alignments, or use inconsistent naming formats. This work presents PianoCoRe, a large-scale piano MIDI dataset that unifies and refines major open-source piano corpora. The dataset contains 250,046 performances of 5,625 pieces written by 483 composers, totaling 21,763 h of performed music. PianoCoRe is released in tiered subsets to support different applications: from large-scale analysis and pre-training (PianoCoRe-C and deduplicated PianoCoRe-B) to expressive performance modeling with note-level score alignment (PianoCoRe-A/A*). The note-aligned subset, PianoCoRe-A, provides the largest open-source collection of 157,207 performances aligned to 1,591 scores to date. In addition to the dataset, the contributions are: (1) a MIDI quality classifier for detecting corrupted and score-like transcriptions and (2) RAScoP, an alignment refinement pipeline that cleans temporal alignment errors and interpolates missing notes. The analysis shows that the refinement reduces temporal noise and eliminates tempo outliers. Moreover, an expressive performance rendering model trained on PianoCoRe demonstrates improved robustness to unseen pieces compared to models trained on raw or smaller datasets. PianoCoRe provides a ready-to-use foundation for the next generation of expressive piano performance research.

URL PDF HTML ☆

赞 0 踩 0

2605.06625 2026-05-08 cs.CL

Parser agreement and disagreement in L2 Korean UD: Implications for human-in-the-loop annotation

Hakyung Sung, Gyu-Ho Shin

Comments To be published in the 20th Linguistic Annotation Workshop

2605.06619 2026-05-08 cs.CL cs.CY

Algospeak, Hiding in the Open: The Trade-off Between Legible Meaning and Detection Avoidance

Jan Fillies, Ronald E. Robertson, Jeffrey Hancock

Comments Under Review

2605.06615 2026-05-08 cs.LG cs.AI cs.CL math.OC

When and Why SignSGD Outperforms SGD: A Theoretical Study Based on $\ell_1$-norm Lower Bounds

Hongyi Tao, Dingzhi Yu, Lijun Zhang

Comments Code is available at https://github.com/Dingzhen230/SignSGD_Outperforms_SGD

2605.06614 2026-05-08 cs.AI cs.CL

SkillOS: Learning Skill Curation for Self-Evolving Agents

Siru Ouyang, Jun Yan, Yanfei Chen, Rujun Han, Zifeng Wang, Bhavana Dalvi Mishra, Rui Meng, Chun-Liang Li, Yizhu Jiao, Kaiwen Zha, Maohao Shen, Vishy Tirumalashetty, George Lee, Jiawei Han, Tomas Pfister, Chen-Yu Lee

Comments 11 pages, 6 figures, 3 tables

2605.06612 2026-05-08 cs.LG cs.ET stat.ML

Online Bayesian Calibration under Gradual and Abrupt System Changes

Yang Xu, Chiwoo Park

详情

英文摘要

Bayesian model calibration is central to digital twins and computer experiments, as it aligns model outputs with field observations by estimating calibration parameters and correcting systematic model bias. Classical Bayesian calibration introduces latent parameters and a discrepancy function to model bias, but suffers from parameter--discrepancy confounding and is typically formulated as an offline procedure under a stationary data-generating assumption. These limitations are restrictive in modern digital twin applications, where systems evolve over time and may exhibit gradual drift and abrupt regime shifts. While data assimilation methods enable sequential updates, they generally do not explicitly model systematic bias and are less effective under abrupt changes. We propose Bayesian Recursive Projected Calibration (BRPC), an online Bayesian calibration framework for streaming data under simulator mismatch and nonstationarity. BRPC extends projected calibration to the online setting by separating a discrepancy-free particle update for calibration parameters from a conditional Gaussian process update for discrepancy, preserving identifiability while enabling bias-aware adaptation under gradual system evolution. To handle abrupt changes, BRPC is integrated with restart mechanisms that detect regime shifts and reset the calibration process. We establish theoretical guarantees for both components, including tracking performance under gradual evolution and false-alarm and detection behavior for restart mechanisms. Empirical studies on synthetic and plant-simulation benchmarks show that BRPC improves calibration accuracy under gradual changes, while restart-augmented BRPC further improves robustness and predictive performance under abrupt regime shifts compared to sliding-window Bayesian calibration and data assimilation baselines.

URL PDF HTML ☆

赞 0 踩 0

2605.06611 2026-05-08 cs.LG cs.AI stat.ML

The Structural Origin of Attention Sink: Variance Discrepancy, Super Neurons, and Dimension Disparity

Siquan Li, Kaiqi Jiang, Jiacheng Sun, Tianyang Hu

Comments Accepted to ICML 2026

2605.06609 2026-05-08 cs.LG stat.ML

Transformers Efficiently Perform In-Context Logistic Regression via Normalized Gradient Descent

Chenyang Zhang, Yuan Cao

Comments 94 pages, 8 figures

2605.06605 2026-05-08 cs.LG

How Many Iterations to Jailbreak? Dynamic Budget Allocation for Multi-Turn LLM Evaluation

Shai Feldman, Yaniv Romano

2605.06599 2026-05-08 cs.LG eess.AS

Weight-Decay Turns Transformer Loss Landscapes Villani: Functional-Analytic Foundations for Optimization and Generalization

Abhijit Das, Sayantan Dutta

Comments 17 pages, 10 figures

2605.06595 2026-05-08 cs.RO cs.AI cs.LG cs.MA

Cross-Modal Navigation with Multi-Agent Reinforcement Learning

Shuo Liu, Xinzichen Li, Christopher Amato