arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2510.24038 2026-03-02 cs.CV cs.MA

Enhancing CLIP Robustness via Cross-Modality Alignment

Xingyu Zhu, Beier Zhu, Shuo Wang, Kesen Zhao, Hanwang Zhang

Comments NeurIPS 2025 Spotlight

详情

英文摘要

Vision-language models (VLMs) such as CLIP demonstrate strong generalization in zero-shot classification but remain highly vulnerable to adversarial perturbations. Existing methods primarily focus on adversarial fine-tuning or prompt optimization; they often overlook the gaps in CLIP's encoded features, which is shown as the text and image features lie far apart from each other. This misalignment is significantly amplified under adversarial perturbations, leading to severe degradation in classification performance. To address this problem, we propose Cross-modality Alignment, dubbed COLA, an optimal transport-based framework that explicitly addresses adversarial misalignment by restoring both global image-text alignment and local structural consistency in the feature space. (1) COLA first projects adversarial image embeddings onto a subspace spanned by class text features, effectively filtering out non-semantic distortions while preserving discriminative information. (2) It then models images and texts as discrete distributions over multiple augmented views and refines their alignment via OT, with the subspace projection seamlessly integrated into the cost computation. This design ensures stable cross-modal alignment even under adversarial conditions. COLA is training-free and compatible with existing fine-tuned models. Extensive evaluations across 14 zero-shot classification benchmarks demonstrate the effectiveness of COLA, especially with an average improvement of 6.7% on ImageNet and its variants under PGD adversarial attacks, while maintaining high accuracy on clean samples.

URL PDF HTML ☆

赞 0 踩 0

2510.23299 2026-03-02 cs.CV cs.MM

MMSD3.0: A Multi-Image Benchmark for Real-World Multimodal Sarcasm Detection

Haochen Zhao, Yuyao Kong, Yongxiu Xu, Gaopeng Gou, Hongbo Xu, Yubin Wang, Haoliang Zhang

2510.22543 2026-03-02 cs.LG

FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable Reasoning

Yuyang Ding, Chi Zhang, Juntao Li, Haibin Lin, Min Zhang

Comments ICLR 2026. Project page: https://fapo-rl.github.io/; Infra Doc: https://verl.readthedocs.io/en/latest/advance/reward_loop.html

2510.21171 2026-03-02 cs.CV

TokenCLIP: Token-wise Prompt Learning for Zero-shot Anomaly Detection

Qihang Zhou, Binbin Gao, Guansong Pang, Xin Wang, Jiming Chen, Shibo He

2510.20812 2026-03-02 cs.CV cs.AI cs.CL

Small Drafts, Big Verdict: Information-Intensive Visual Reasoning via Speculation

Yuhan Liu, Lianhui Qin, Shengjie Wang

Comments Accepted to ICLR 2026

2510.18101 2026-03-02 cs.CV

From Volume Rendering to 3D Gaussian Splatting: Theory and Applications

Vitor Pereira Matias, Daniel Perazzo, Vinicius Silva, Alberto Raposo, Luiz Velho, Afonso Paiva, Tiago Novello

Comments Accepted at the Conference on Graphics, Patterns and Images (SIBGRAPI), math focused, 5 equations, 5 Figure, 5 pages of text and 1 of bibligraphy

2510.17480 2026-03-02 cs.LG

Unified Privacy Guarantees for Decentralized Learning via Matrix Factorization

Aurélien Bellet, Edwige Cyffers, Davide Frey, Romaric Gaudel, Dimitri Lerévérend, François Taïani

Comments Accepted at ICLR 2026. 23 pages, 6 figures

2510.17268 2026-03-02 cs.LG stat.ML

Uncertainty-aware data assimilation through variational inference

Anthony Frion, David S Greenberg

2510.13358 2026-03-02 cs.RO cs.AI

Adversarial Fine-tuning in Offline-to-Online Reinforcement Learning for Robust Robot Control

Shingo Ayabe, Hiroshi Kera, Kazuhiko Kawamoto

Comments 15 main pages, 8 supplementary material pages

2510.13328 2026-03-02 cs.LG cs.AI

Thompson Sampling via Fine-Tuning of LLMs

Nicolas Menet, Aleksandar Terzić, Michael Hersche, Andreas Krause, Abbas Rahimi

Comments accepted at ICLR 2026

2510.12768 2026-03-02 cs.CV cs.AI cs.GR

Uncertainty Matters in Dynamic Gaussian Splatting for Monocular 4D Reconstruction

Fengzhi Guo, Chih-Chuan Hsu, Sihao Ding, Cheng Zhang

Comments Accepted to ICLR 2026. Project page: https://tamu-visual-ai.github.io/usplat4d/

2510.06730 2026-03-02 cs.CL

PTEB: Towards Robust Text Embedding Evaluation via Stochastic Paraphrasing at Evaluation Time with LLMs

Manuel Frank, Haithem Afli

Comments EACL 2026 (Main)

2510.05930 2026-03-02 cs.LG cs.AI math.DG

Carré du champ flow matching: better quality-generalisation tradeoff in generative models

Jacob Bamberger, Iolo Jones, Dennis Duncan, Michael M. Bronstein, Pierre Vandergheynst, Adam Gosztolai

2510.05535 2026-03-02 cs.LG cs.AI

Permutation-Invariant Representation Learning for Robust and Privacy-Preserving Feature Selection

Rui Liu, Tao Zhe, Yanjie Fu, Feng Xia, Ted Senator, Dongjie Wang

Comments We note that this work has been reproduced without authorization by Stchingtana Naryso and Zihang Yang under the title "Robust and Privacy-Preserving Feature Selection: A Permutation-Invariant Representation Learning Approach with Federated Extension." Their version remains the same technical content, with only the title and abstract changed. This version is the authoritative and original source. arXiv admin note: substantial text overlap with arXiv:2505.11601

2510.05228 2026-03-02 cs.LG cs.AI

CMT-Benchmark: A Benchmark for Condensed Matter Theory Built by Expert Researchers

Haining Pan, James V. Roggeveen, Erez Berg, Juan Carrasquilla, Debanjan Chowdhury, Surya Ganguli, Federico Ghimenti, Juraj Hasik, Henry Hunt, Hong-Chen Jiang, Mason Kamb, Ying-Jer Kao, Ehsan Khatami, Michael J. Lawler, Di Luo, Titus Neupert, Xiaoliang Qi, Michael P. Brenner, Eun-Ah Kim

Comments CMT-Benchmark dataset is available at https://huggingface.co/datasets/JVRoggeveen/cmt_benchmark. CMT-Benchmark was referenced in the Gemini 3 Deep Think (February 2026) release at https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-deep-think/

详情

Journal ref: International Conference on Learning Representations (ICLR) main conference 2026

英文摘要

Large language models (LLMs) have shown remarkable progress in coding and math problem-solving, but evaluation on advanced research-level problems in hard sciences remains scarce. To fill this gap, we present CMT-Benchmark, a dataset of 50 problems covering condensed matter theory (CMT) at the level of an expert researcher. Topics span analytical and computational approaches in quantum many-body, and classical statistical mechanics. The dataset was designed and verified by a panel of expert researchers from around the world. We built the dataset through a collaborative environment that challenges the panel to write and refine problems they would want a research assistant to solve, including Hartree-Fock, exact diagonalization, quantum/variational Monte Carlo, density matrix renormalization group (DMRG), quantum/classical statistical mechanics, and model building. We evaluate LLMs by programmatically checking solutions against expert-supplied ground truth. We developed machine-grading, including symbolic handling of non-commuting operators via normal ordering. They generalize across tasks too. Our evaluations show that frontier models struggle with all of the problems in the dataset, highlighting a gap in the physical reasoning skills of current LLMs. Notably, experts identified strategies for creating increasingly difficult problems by interacting with the LLMs and exploiting common failure modes. The best model, GPT5, solves 30\% of the problems; average across 17 models (GPT, Gemini, Claude, DeepSeek, Llama) is 11.4\pm2.1\%. Moreover, 18 problems are solved by none of the 17 models, and 26 by at most one. These unsolved problems span Quantum Monte Carlo, Variational Monte Carlo, and DMRG. Answers sometimes violate fundamental symmetries or have unphysical scaling dimensions. We believe this benchmark will guide development toward capable AI research assistants and tutors.

URL PDF HTML ☆

赞 0 踩 0

2510.04883 2026-03-02 cs.RO cs.CV cs.LG

CLEAR-IR: Clarity-Enhanced Active Reconstruction of Infrared Imagery

Nathan Shankar, Pawel Ladosz, Hujun Yin

Comments 8 pages, 6 figures, 2 tables

2510.04855 2026-03-02 cs.LG

Synthesising Counterfactual Explanations via Label-Conditional Gaussian Mixture Variational Autoencoders

Junqi Jiang, Francesco Leofante, Antonio Rago, Francesca Toni

Comments Accepted at ICLR 2026. Camera-ready version

2510.03632 2026-03-02 cs.AI

MITS: Enhanced Tree Search Reasoning for LLMs via Pointwise Mutual Information

Jiaxi Li, Yucheng Shi, Xiao Huang, Jin Lu, Ninghao Liu

Comments 18 pages

2510.00060 2026-03-02 cs.CV cs.AI cs.RO

Less is More: Lean yet Powerful Vision-Language Model for Autonomous Driving

Sheng Yang, Tong Zhan, Guancheng Chen, Yanfeng Lu, Jian Wang

2509.26578 2026-03-02 cs.LG

Linking Process to Outcome: Conditional Reward Modeling for LLM Reasoning

Zheng Zhang, Ziwei Shan, Kaitao Song, Yexin Li, Kan Ren

2509.25249 2026-03-02 cs.RO cs.AI

BEV-VLM: Trajectory Planning via Unified BEV Abstraction

Guancheng Chen, Sheng Yang, Tong Zhan, Jian Wang

2509.24945 2026-03-02 cs.CL cs.AI

MobileLLM-R1: Exploring the Limits of Sub-Billion Language Model Reasoners with Open Training Recipes

Changsheng Zhao, Ernie Chang, Zechun Liu, Chia-Jung Chang, Wei Wen, Chen Lai, Sheng Cao, Yuandong Tian, Raghuraman Krishnamoorthi, Yangyang Shi, Vikas Chandra

Comments ICLR 2026

2509.24159 2026-03-02 cs.AI

RE-PO: Robust Enhanced Policy Optimization as a General Framework for LLM Alignment

Xiaoyang Cao, Zelai Xu, Mo Guang, Kaiwen Long, Michiel A. Bakker, Yu Wang, Chao Yu

2509.23735 2026-03-02 cs.AI cs.SE

Demystifying the Lifecycle of Failures in Platform-Orchestrated Agentic Workflows

Xuyan Ma, Xiaofei Xie, Yawen Wang, Junjie Wang, Boyu Wu, Mingyang Li, Qing Wang

2509.23371 2026-03-02 cs.CL cs.AI cs.LG

Alignment through Meta-Weighted Online Sampling: Bridging the Gap between Data Generation and Preference Optimization

Junming Yang, Ning Xu, Biao Liu, Shiqi Qiao, Xin Geng

Comments Accepted by ICLR 2026

2509.23234 2026-03-02 cs.AI cs.CL

p-less Sampling: A Robust Hyperparameter-Free Approach for LLM Decoding

Runyan Tan, Shuang Wu, Phillip Howard

2509.23159 2026-03-02 cs.LG

ProtoTS: Learning Hierarchical Prototypes for Explainable Time Series Forecasting

Ziheng Peng, Shijie Ren, Xinyue Gu, Linxiao Yang, Xiting Wang, Liang Sun

Comments ICLR 2026 Poster

2509.22353 2026-03-02 cs.LG cs.AI

Context and Diversity Matter: The Emergence of In-Context Learning in World Models

Fan Wang, Zhiyuan Chen, Yuxuan Zhong, Sunjian Zheng, Pengtao Shao, Bo Yu, Shaoshan Liu, Jianan Wang, Ning Ding, Yang Cao, Yu Kang

2509.21021 2026-03-02 cs.LG cs.AI stat.ML

Efficient Ensemble Conditional Independence Test Framework for Causal Discovery

Zhengkang Guan, Kun Kuang

Comments Published as a conference paper at ICLR 2026

2509.17010 2026-03-02 cs.RO cs.SY eess.SY

Generalized Momenta-Based Koopman Formalism for Robust Control of Euler-Lagrangian Systems

Rajpal Singh, Aditya Singh, Chidre Shravista Kashyap, Jishnu Keshavan