arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.00825 2026-05-04 cs.CV

Posterior Augmented Flow Matching

George Stoica, Sayak Paul, Matthew Wallingford, Vivek Ramanujan, Abhay Nori, Winson Han, Ali Farhadi, Ranjay Krishna, Judy Hoffman

详情

英文摘要

Flow matching (FM) trains a time-dependent vector field that transports samples from a simple prior to a complex data distribution. However, for high-dimensional images, each training sample supervises only a single trajectory and intermediate point, yielding an extremely sparse and high-variance training signal. This under-constrained supervision can cause flow collapse, where the learned dynamics memorize specific source-target pairings, mapping diverse inputs to overly similar outputs, failing to generalize. We introduce Posterior-Augmented Flow Matching (PAFM), a theoretically grounded generalization of FM that replaces single-target supervision with an expectation over an approximate posterior of valid target completions for a given intermediate state and condition. PAFM factorizes this intractable posterior into (i) the likelihood of the intermediate under a hypothesized endpoint and (ii) the prior probability of that endpoint under the condition, and uses an importance sampling scheme to construct a mixture over multiple candidate targets. We prove that PAFM yields an unbiased estimator of the original FM objective while substantially reducing gradient variance during training by aggregating information from many plausible continuation trajectories per intermediate. Finally, we show that PAFM improves over FM by up to 3.4 FID50K across different model scales (SiT-B/2 and SiT-XL/2), different architectures (SiT and MMDiT), and in both class and text conditioned benchmarks (ImageNet and CC12M), with a negligible increase in the compute overhead. Code: https://github.com/gstoica27/PAFM.git.

URL PDF HTML ☆

赞 0 踩 0

2605.00800 2026-05-04 cs.LG

Generating Statistical Charts with Validation-Driven LLM Workflows

Pavlin G. Poličar, Andraž Pevcin, Blaž Zupan

2605.00799 2026-05-04 cs.CV

GMGaze: MoE-Based Context-Aware Gaze Estimation with CLIP and Multiscale Transformer

Xinyuan Zhao, Yihang Wu, Ahmad Chaddad, Sarah A. Alkhodair, Reem Kateb

Comments Accepted in KBS

2605.00798 2026-05-04 cs.LG cs.CL cs.MA

RunAgent: Interpreting Natural-Language Plans with Constraint-Guided Execution

Arunabh Srivastava, Mohammad A., Khojastepour, Srimat Chakradhar, Sennur Ulukus

2605.00789 2026-05-04 cs.CV cs.AI cs.LG

Make Your LVLM KV Cache More Lightweight

Xihao Chen, Yangyang Guo, Roger Zimmermann

Comments Accepted to Transactions on Machine Learning Research (TMLR), 2026

2605.00787 2026-05-04 cs.LG

SAVGO: Learning State-Action Value Geometry with Cosine Similarity for Continuous Control

Stavros Orfanoudakis, Pedro P. Vergara

Comments Reinforcement Learning

2605.00781 2026-05-04 cs.CV

Map2World: Segment Map Conditioned Text to 3D World Generation

Jaeyoung Chung, Suyoung Lee, Jianfeng Xiang, Jiaolong Yang, Kyoung Mu Lee

Comments project page: https://robot0321.github.io/Map2World/index.html

2605.00778 2026-05-04 cs.LG q-bio.NC

Observable Performance Does Not Fully Reflect System Organization: A Multi-Level Analysis of Gait Dynamics Under Occlusal Constraint

Jacques Raynal, Pierre Slangen, Jacques Margerit

Comments 1 table, 4 figures. Exploratory single-case study

2605.00777 2026-05-04 cs.SD cs.CL eess.AS

LASE: Language-Adversarial Speaker Encoding for Indic Cross-Script Identity Preservation

Venkata Pushpak Teja Menta

Comments 7 pages, 2 figures, 2 tables. Code, model, and datasets at https://github.com/praxelhq/lase

2605.00776 2026-05-04 cs.CL cs.AI

Directed Social Regard: Surfacing Targeted Advocacy, Opposition, Aid, Harms, and Victimization in Online Media

Scott Friedman, Ruta Wheelock, Sonja Schmer-Galunder, Drisana Iverson, Jake Vasilakes, Joan Zheng, Jeffrey Rye, Vasanth Sarathy, Christopher Miller

Comments 32 pages, 12 figures, 7 tables

2605.00764 2026-05-04 cs.CV cs.HC

Modeling Subjective Urban Perception with Human Gaze

Lin Che, Xi Wang, Marc Pollefeys, Konrad Schindler, Martin Raubal, Peter Kiefer

2605.00762 2026-05-04 cs.LG cs.AI cs.MA

Meritocratic Fairness in Budgeted Combinatorial Multi-armed Bandits via Shapley Values

Shradha Sharma, Swapnil Dhamal, Shweta Jain

2605.00760 2026-05-04 cs.LG

Learning the Helmholtz equation operator with DeepONet for non-parametric 2D geometries

Rodolphe Barlogis, Ferhat Tamssaouet, Quentin Falcoz, Stéphane Grieu

Comments 24 pages, 16 figures

2605.00751 2026-05-04 cs.LG

NonZero: Interaction-Guided Exploration for Multi-Agent Monte Carlo Tree Search

Sizhe Tang, Zuyuan Zhang, Mahdi Imani, Tian Lan

Comments Accepted by ICML 2026 as Spotlight

2605.00744 2026-05-04 cs.CV

Quantum Gradient-Based Approach for Edge and Corner Detection Using Sobel Kernels

Mohammad Aamir Sohail, Gabriela Pinheiro, Yasemin Poyraz Kocak, Batuhan Hangun, Emre Camkerten, Simge Yigit, Hafize Asude Ertan

2605.00738 2026-05-04 cs.LG

Temporal Data Requirement for Predicting Unplanned Hospital Readmissions

Ramin Mohammadi, Vahab vahdat, Sarthak Jain, Amir T. Namin, Ramya Palacholla, Sagar Kamarthi

2604.27977 2026-05-04 cs.AI cs.LG

D3-Gym: Constructing Real-World Verifiable Environments for Data-Driven Discovery

Hanane Nour Moussa, Yifei Li, Zhuoyang Li, Yankai Yang, Cheng Tang, Tianshu Zhang, Nesreen K. Ahmed, Ali Payani, Ziru Chen, Huan Sun

2604.10418 2026-05-04 cs.CL

Turing or Cantor: That is the Question

Eugene Eberbach

Comments arXiv admin note: text overlap with arXiv:2106.15969

2604.06940 2026-05-04 cs.LG cs.AI

A First Guess is Rarely the Final Answer: Learning to Search in the Traveling Salesperson Problem

Andoni Irazusta Garmendia

2604.04385 2026-05-04 cs.CL cs.AI cs.LG

How Alignment Routes: Localizing, Scaling, and Controlling Policy Circuits in Language Models

Gregory N. Frank

Comments Code and data: https://github.com/gregfrank/how-alignment-routes

详情

英文摘要

We localize the policy routing mechanism in alignment-trained language models. An intermediate-layer attention gate reads detected content and triggers deeper amplifier heads that boost the signal toward refusal. In smaller models the gate and amplifier are single heads; at larger scale they become bands of heads across adjacent layers. The gate contributes under 1% of output DLA, yet interchange testing (p < 0.001) and knockout cascade confirm it is causally necessary. Interchange screening at n >= 120 detects the same motif in twelve models from six labs (2B to 72B), though specific heads differ by lab. Per-head ablation weakens up to 58x at 72B and misses gates that interchange identifies; at scale, interchange is the only reliable audit. Modulating the detection-layer signal continuously controls policy from hard refusal through evasion to factual answering. On safety prompts the same intervention turns refusal into harmful guidance, showing that the safety-trained capability is gated by routing, not removed. Thresholds vary by topic and by input language, and the circuit relocates across generations within a family even while behavioral benchmarks register no change. Routing is early-commitment: the gate fires at its own layer before deeper layers finish processing the input. An in-context substitution cipher collapses gate interchange necessity by 70 to 99% across three models, and the model switches to puzzle-solving rather than refusal. Injecting the plaintext gate activation into the cipher forward pass restores 48% of refusals in Phi-4-mini, localizing the bypass to the routing interface. A second method, cipher contrast analysis, uses plain/cipher DLA differences to map the full cipher-sensitive routing circuit in O(3n) forward passes. Any encoding that defeats detection-layer pattern matching bypasses the policy regardless of whether deeper layers reconstruct the content.

URL PDF HTML ☆

赞 0 踩 0

2603.28980 2026-05-04 cs.CV

Stepper: Stepwise Immersive Scene Generation with Multiview Panoramas

Felix Wimbauer, Fabian Manhardt, Michael Oechsle, Nikolai Kalischek, Christian Rupprecht, Daniel Cremers, Federico Tombari

Comments Accepted at CVPR 2026 Findings; Find our project page under https://fwmb.github.io/stepper/

2603.18280 2026-05-04 cs.LG cs.AI cs.CL

Detection Is Cheap, Routing Is Learned: Why Refusal-Based Alignment Evaluation Fails

Gregory N. Frank

Comments Code and data: https://github.com/gregfrank/routing-is-learned

2602.14276 2026-05-04 cs.CV

ScreenParse: Moving Beyond Sparse Grounding with Complete Screen Parsing Supervision

A. Said Gurbuz, Sunghwan Hong, Ahmed Nassar, Marc Pollefeys, Peter Staar

Comments Accepted at ICML 2026. 28 pages, 15 figures

2602.13595 2026-05-04 cs.AI

The Quantization Trap: Breaking Linear Scaling Laws in Multi-Hop Reasoning

Henry Han, Xiyang Liu, Xiaodong Wang, Fei Han, Xiaodong Li

Comments 23 pages, 8 figures

2602.13305 2026-05-04 cs.CV cs.AI

WildfireVLM: AI-powered Analysis for Early Wildfire Detection and Risk Assessment Using Satellite Imagery

Aydin Ayanzadeh, Prakhar Dixit, Sadia Kamal, Milton Halem

2601.21214 2026-05-04 cs.CL cs.LG

Scaling Reasoning Hop Exposes Weaknesses: Demystifying and Improving Hop Generalization in Large Language Models

Zhaoyi Li, Jiatong Li, Gangwei Jiang, Linqi Song, Defu Lian, Ying Wei

Comments 52 pages, accepted by ICLR 2026 main conference

2512.16762 2026-05-04 cs.LG

NRGPT: An Energy-based Alternative for GPT

Nima Dehmamy, Benjamin Hoover, Bishwajit Saha, Leo Kozachkov, Jean-Jacques Slotine, Dmitry Krotov

Comments Accepted to ICLR 2026 main conference

2512.01116 2026-05-04 cs.CV

Structural Prognostic Event Modeling for Multimodal Cancer Survival Analysis

Yilan Zhang, Li Nanbo, Changchun Yang, Jürgen Schmidhuber, Xin Gao

Comments 37 pages, 14 Figures

2510.22819 2026-05-04 cs.LG

Last-Iterate Analyses of FTRL with the 1/2-Tsallis Entropy in Stochastic Bandits

Jingxin Zhan, Yuze Han, Zhihua Zhang

2507.22699 2026-05-04 cs.CV

Image-Guided Shape-from-Template Using Mesh Inextensibility Constraints

Thuy Tran, Ruochen Chen, Shaifali Parashar

Comments Accepted to ICCV 2025. Total 13 pages, 9 figures, 9 tables