arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2511.06626 2026-03-24 cs.AI

Spilling the Beans: Teaching LLMs to Self-Report Their Hidden Objectives

Chloe Li, Mary Phuong, Daniel Tan

详情

英文摘要

As AI systems become more capable of complex agentic tasks, they also become more capable of pursuing undesirable objectives and causing harm. Previous work has attempted to catch these unsafe instances by interrogating models directly about their objectives and behaviors. However, the main weakness of trusting interrogations is that models can lie. We propose self-report fine-tuning (SRFT), a simple supervised fine-tuning technique that trains models to occasionally make factual mistakes, then admit them when asked. We show that the admission of factual errors in simple question-answering settings generalizes out-of-distribution (OOD) to the admission of hidden misaligned objectives in adversarial agentic settings. We evaluate SRFT in OOD stealth tasks, where models are instructed to complete a hidden misaligned objective alongside a user-specified objective without being caught by monitoring. After SRFT, models are more likely to confess the details of their hidden objectives when interrogated, even under strong pressure not to disclose them. Interrogation on SRFT models can detect hidden objectives with near-ceiling performance (F1 score = 0.98), while the baseline model lies when interrogated under the same conditions (F1 score = 0). Interrogation on SRFT models can further elicit the content of the hidden objective, recovering 28-100% details, compared to 0% details recovered in the baseline model and by prefilled assistant turn attacks. This provides a promising technique for promoting honesty propensity and incriminating misaligned AIs.

URL PDF HTML ☆

赞 0 踩 0

2511.05421 2026-03-24 cs.CV

Sharing the Learned Knowledge-base to Estimate Convolutional Filter Parameters for Continual Image Restoration

Aupendu Kar, Krishnendu Ghosh, Prabir Kumar Biswas

Comments This paper has been accepted to ACM ICVGIP 2025

2511.03237 2026-03-24 cs.CL

MUTANT: A Recipe for Multilingual Tokenizer Design

Souvik Rana, Arul Menezes, Ashish Kulkarni, Chandra Khatri, Shubham Agarwal

2511.01571 2026-03-24 cs.CV cs.RO

PixelVLA: Advancing Pixel-level Understanding in Vision-Language-Action Model

Wenqi Liang, Gan Sun, Yao He, Jiahua Dong, Suyan Dai, Ivan Laptev, Salman Khan, Yang Cong

Comments 17pages,7 figures, 5 tabels

2510.27543 2026-03-24 cs.CL cs.AI

DialectalArabicMMLU: Benchmarking Dialectal Capabilities in Arabic and Multilingual Language Models

Malik H. Altakrori, Nizar Habash, Abed Alhakim Freihat, Younes Samih, Kirill Chirkunov, Muhammed AbuOdeh, Radu Florian, Teresa Lynn, Preslav Nakov, Alham Fikri Aji

Comments 9 pages, 10 tables, accepted to LREC 2026

2510.23049 2026-03-24 cs.LG cs.AI

Advantage Shaping as Surrogate Reward Maximization: Unifying Pass@K Policy Gradients

Christos Thrampoulidis, Sadegh Mahdavi, Wenlong Deng

Comments v3: Camera-ready version (TMLR)

2510.21271 2026-03-24 cs.LG cs.CV

Buffer layers for Test-Time Adaptation

Hyeongyu Kim, Geonhui Han, Dosik Hwang

Comments NeurIPS 2025

2510.12060 2026-03-24 cs.LG cs.AI cs.CV

Your VAR Model is Secretly an Efficient and Explainable Generative Classifier

Yi-Chung Chen, David I. Inouye, Jing Gao

Comments ICLR 2026

2510.11026 2026-03-24 cs.CV

GIR-Bench: Versatile Benchmark for Generating Images with Reasoning

Hongxiang Li, Yaowei Li, Bin Lin, Yuwei Niu, Yuhang Yang, Xiaoshuang Huang, Jiayin Cai, Xiaolong Jiang, Yao Hu, Long Chen

Comments ICLR2026

2510.08771 2026-03-24 cs.CV

LinearSR: Unlocking Linear Attention for Stable and Efficient Image Super-Resolution

Xiaohui Li, Shaobin Zhuang, Shuo Cao, Yang Yang, Yuandong Pu, Qi Qin, Siqi Luo, Bin Fu, Yihao Liu

Comments Camera Ready of ICLR2026

2510.08713 2026-03-24 cs.AI cs.CV cs.RO

Towards Unified World Models for Visual Navigation via Memory-Augmented Planning and Foresight

Yifei Dong, Fengyi Wu, Guangyu Chen, Lingdong Kong, Xu Zhu, Qiyu Hu, Yuxuan Zhou, Jingdong Sun, Jun-Yan He, Qi Dai, Alexander G. Hauptmann, Zhi-Qi Cheng

Comments 21 pages, 12 figures, code: https://github.com/F1y1113/UniWM

2510.06638 2026-03-24 cs.CV cs.AI

StaR-KVQA: Structured Reasoning Traces for Implicit-Knowledge Visual Question Answering

Zhihao Wen, Wenkang Wei, Yuan Fang, Xingtong Yu, Hui Zhang, Weicheng Zhu, Xin Zhang

Comments 8+3+3 pages, code: https://github.com/jianyingzhihe/StaR-KVQA

2510.06552 2026-03-24 cs.CL

Flipping the Dialogue: Training and Evaluating User Language Models

Tarek Naous, Philippe Laban, Wei Xu, Jennifer Neville

Comments Accepted at ICLR 2026

2510.06199 2026-03-24 cs.RO

DYMO-Hair: Generalizable Volumetric Dynamics Modeling for Robot Hair Manipulation

Chengyang Zhao, Uksang Yoo, Arkadeep Narayan Chaudhury, Giljoo Nam, Jonathan Francis, Jeffrey Ichnowski, Jean Oh

Comments To appear in ICRA 2026. Project page: https://dymohair.github.io/

2510.05416 2026-03-24 cs.LG

Correlating Cross-Iteration Noise for DP-SGD using Model Curvature

Xin Gu, Yingtai Xiao, Guanlin He, Jiamu Bai, Daniel Kifer, Kiwan Maeng

2510.05092 2026-03-24 cs.LG cs.AI cs.CL

Learning to Interpret Weight Differences in Language Models

Avichal Goel, Yoon Kim, Nir Shavit, Tony T. Wang

Comments Project code and links to weight diffs, adapters, and training data can be found at https://github.com/Aviously/diff-interpretation-tuning

2510.03798 2026-03-24 cs.LG stat.ML

Robust Batched Bandits

Yunwen Guo, Yunlun Shu, Gongyi Zhuo, Tianyu Wang

Comments 39 pages

2510.02249 2026-03-24 cs.CL cs.AI cs.LG

Explore Briefly, Then Decide: Mitigating LLM Overthinking via Cumulative Entropy Regulation

Yi Bin, Tianyi Jiang, Yujuan Ding, Kainian Zhu, Fei Ma, Jingkuan Song, Yang Yang, Heng Tao Shen

Comments Code: https://github.com/AusertDream/CumulativeEntropyRegulation

2510.01641 2026-03-24 cs.CV

FideDiff: Efficient Diffusion Model for High-Fidelity Image Motion Deblurring

Xiaoyang Liu, Zhengyan Zhou, Zihang Xu, Jiezhang Cao, Zheng Chen, Yulun Zhang

Comments Accepted to ICLR 2026. Code is available at https://github.com/xyLiu339/FideDiff

2509.24817 2026-03-24 cs.CV

UP2You: Fast Reconstruction of Yourself from Unconstrained Photo Collections

Zeyu Cai, Ziyang Li, Xiaoben Li, Boqian Li, Zeyu Wang, Zhenyu Zhang, Yuliang Xiu

Comments Page: https://zcai0612.github.io/UP2You Code: https://github.com/zcai0612/UP2You

详情

Journal ref: International Conference on Learning Representations (ICLR), 2026

英文摘要

We present UP2You, the first tuning-free solution for reconstructing high-fidelity 3D clothed portraits from extremely unconstrained in-the-wild 2D photos. Unlike previous approaches that require "clean" inputs (e.g., full-body images with minimal occlusions, or well-calibrated cross-view captures), UP2You directly processes raw, unstructured photographs, which may vary significantly in pose, viewpoint, cropping, and occlusion. Instead of compressing data into tokens for slow online text-to-3D optimization, we introduce a data rectifier paradigm that efficiently converts unconstrained inputs into clean, orthogonal multi-view images in a single forward pass within seconds, simplifying the 3D reconstruction. Central to UP2You is a pose-correlated feature aggregation module (PCFA), that selectively fuses information from multiple reference images w.r.t. target poses, enabling better identity preservation and nearly constant memory footprint, with more observations. We also introduce a perceiver-based multi-reference shape predictor, removing the need for pre-captured body templates. Extensive experiments on 4D-Dress, PuzzleIOI, and in-the-wild captures demonstrate that UP2You consistently surpasses previous methods in both geometric accuracy (Chamfer-15%, P2S-18% on PuzzleIOI) and texture fidelity (PSNR-21%, LPIPS-46% on 4D-Dress). UP2You is efficient (1.5 minutes per person), and versatile (supports arbitrary pose control, and training-free multi-garment 3D virtual try-on), making it practical for real-world scenarios where humans are casually captured. Both models and code will be released to facilitate future research on this underexplored task. Project Page: https://zcai0612.github.io/UP2You

URL PDF HTML ☆

赞 0 踩 0

2509.23774 2026-03-24 cs.CV

Texture Vector-Quantization and Reconstruction Aware Prediction for Generative Super-Resolution

Qifan Li, Jiale Zou, Jinhua Zhang, Wei Long, Xingyu Zhou, Shuhang Gu

Comments Accepted to ICLR 2026

2509.21690 2026-03-24 cs.RO

PACE: Physics Augmentation for Coordinated End-to-end Reinforcement Learning toward Versatile Humanoid Table Tennis

Muqun Hu, Wenxi Chen, Wenjing Li, Falak Mandali, Zijian He, Renhong Zhang, Praveen Krisna, Katherine Christian, Leo Benaharon, Dizhi Ma, Karthik Ramani, Yan Gu

2509.21305 2026-03-24 cs.CL

Sycophancy Is Not One Thing: Causal Separation of Sycophantic Behaviors in LLMs

Daniel Vennemeyer, Phan Anh Duong, Tiffany Zhan, Tianyu Jiang

2509.20721 2026-03-24 cs.LG math.ST stat.ML stat.TH

Scaling Laws are Redundancy Laws

Yuda Bi, Vince D Calhoun

Comments This is not a serious research at this time

2509.18131 2026-03-24 cs.LG cs.AI

Randomness and signal propagation in physics-informed neural networks (PINNs): A neural PDE perspective

Jean-Michel Tucny, Abhisek Ganguly, Santosh Ansumali, Sauro Succi

详情

DOI: 10.1140/epjp/s13360-026-07549-0
Journal ref: Tucny, JM., Ganguly, A., Ansumali, S. et al. Randomness and signal propagation in physics-informed neural networks (PINNs): a neural PDE perspective. Eur. Phys. J. Plus 141, 321 (2026)

英文摘要

Physics-informed neural networks (PINNs) often exhibit weight matrices that appear statistically random after training, yet their implications for signal propagation and stability remain unsatisfactorily understood, let alone the interpretability. In this work, we analyze the spectral and statistical properties of trained PINN weights using viscous and inviscid variants of the one-dimensional Burgers' equation, and show that the learned weights reside in a high-entropy regime consistent with predictions from random matrix theory. To investigate the dynamical consequences of such weight structures, we study the evolution of signal features inside a network through the lens of neural partial differential equations (neural PDEs). We show that random and structured weight matrices can be associated with specific discretizations of neural PDEs, and that the numerical stability of these discretizations governs the stability of signal propagation through the network. In particular, explicit unstable schemes lead to degraded signal evolution, whereas stable implicit and higher-order schemes yield well-behaved dynamics for the same underlying neural PDE. Our results offer an explicit example of how numerical stability and network architecture shape signal propagation in deep networks, in relation to random matrix and neural PDE descriptions in PINNs.

URL PDF HTML ☆

赞 0 踩 0

2509.17340 2026-03-24 cs.RO cs.SY eess.SY

AERO-MPPI: Anchor-Guided Ensemble Trajectory Optimization for Agile Mapless Drone Navigation

Xin Chen, Rui Huang, Longbin Tang, Lin Zhao

Comments Accepted by ICRA 2026

2509.16449 2026-03-24 cs.CL cs.AI

PersonaMatrix: A Recipe for Persona-Aware Evaluation of Legal Summarization

Tsz Fung Pang, Maryam Berijanian, Thomas Orth, Breanna Shi, Charlotte S. Alexander

Comments Accepted for publication in JURIX 2025 (Legal Knowledge and Information Systems, FAIA series, IOS Press). Long Paper

2509.14147 2026-03-24 cs.RO

StableTracker: Learning to Stably Track Target via Differentiable Simulation

Fanxing Li, Shengyang Wang, Fangyu Sun, Shuyu Wu, Dexin Zuo, Yufei Yan, Wenxian Yu, Danping Zou

2509.12544 2026-03-24 cs.CV

Neural Collapse-Inspired Multi-Label Federated Learning under Label-Distribution Skew

Can Peng, Yuyuan Liu, Yingyu Yang, Pramit Saha, Qianye Yang, J. Alison Noble

2509.09899 2026-03-24 cs.LG

Variational Neural Networks for Observable Thermodynamics (V-NOTS)

Christopher Eldred, François Gay-Balmaz, Vakhtang Putkaradze

Comments 31 pages, 6 figures