arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2511.11286 2026-03-12 cs.CV cs.AI

D-GAP: Improving Out-of-Domain Robustness via Dataset-Agnostic and Gradient-Guided Augmentation in Frequency and Pixel Spaces

Ruoqi Wang, Haitao Wang, Shaojie Guo, Qiong Luo

详情

英文摘要

Out-of-domain (OOD) robustness is challenging to achieve in real-world computer vision applications, where shifts in image background, style, and acquisition instruments always degrade model performance. Generic augmentations show inconsistent gains under such shifts, whereas dataset-specific augmentations require expert knowledge and prior analysis. Moreover, prior studies show that neural networks adapt poorly to domain shifts because they exhibit a learning bias to domain-specific frequency components. Perturbing frequency values can mitigate such bias but overlooks pixel-level details, leading to suboptimal performance. To address these problems, we propose D-GAP, a Dataset-agnostic and Gradient-guided augmentation method for the Amplitude spectrum (in frequency space) and the Pixel values, improving OOD robustness by introducing targeted augmentation in both frequency and pixel spaces. Unlike conventional handcrafted augmentations, D-GAP computes sensitivity maps in the frequency space from task gradients, which reflect how strongly the deep models respond to different frequency components, and uses the maps to adaptively interpolate amplitudes between source and target samples. This way, D-GAP reduces the learning bias in frequency space, while a complementary pixel-space blending procedure restores fine spatial details. Extensive experiments on four real-world datasets and three domain-adaptation benchmarks show that D-GAP consistently outperforms both generic and dataset-specific domain adaptation methods, improving average OOD performance by +5.3% on real-world datasets and +1.9% on benchmark datasets.

URL PDF HTML ☆

赞 0 踩 0

2511.09433 2026-03-12 cs.AI

What We Don't C: Manifold Disentanglement for Structured Discovery

Brian Rogers, Micah Bowles, Chris J. Lintott, Steve Croft, Oliver N. F. King, James Kostas Ray

Comments v2: Preprint of extended version. 21 pages. v1: Short version accepted to the Machine Learning and the Physical Sciences workshop at NeurIPS 2025 (Number 315: https://ml4physicalsciences.github.io/2025/)

2511.08502 2026-03-12 cs.RO cs.SY eess.SY

Safe and Optimal Learning from Preferences via Weighted Temporal Logic with Applications in Robotics and Formula 1

Ruya Karagulle, Cristian-Ioan Vasile, Necmiye Ozay

Comments 8 pages, 2 figures

2511.05271 2026-03-12 cs.CV cs.AI

DeepEyesV2: Toward Agentic Multimodal Model

Jack Hong, Chenxiao Zhao, ChengLin Zhu, Weiheng Lu, Guohai Xu, Xing Yu

Comments Accepted to ICLR2026. Homepage: https://visual-agent.github.io/

2511.01815 2026-03-12 cs.CL cs.AI cs.LG

KV Cache Transform Coding for Compact Storage in LLM Inference

Konrad Staniszewski, Adrian Łańcucki

Comments Accepted to ICLR 2026

2510.26299 2026-03-12 cs.SD eess.AS

Modeling strategies for speech enhancement in the latent space of a neural audio codec

Sofiene Kammoun, Xavier Alameda-Pineda, Simon Leglaive

2510.23914 2026-03-12 cs.LG

Revisiting Value Iteration: Unified Analysis of Discounted and Average-Reward Cases

Arsenii Mustafin, Xinyi Sheng, Dominik Baumann

Comments 22 pages, 2 figure

2510.20508 2026-03-12 cs.CL

Assessing the Political Fairness of Multilingual LLMs: A Case Study based on a 21-way Multiparallel EuroParl Dataset

Paul Lerner, François Yvon

Comments Accepted at LREC 2026. Added results with new models and two-ANOVA. Same conclusions

2510.18466 2026-03-12 cs.CL

CEFR-Annotated WordNet: LLM-Based Proficiency-Guided Semantic Database for Language Learning

Masato Kikuchi, Masatsugu Ono, Toshioki Soga, Tetsu Tanabe, Tadachika Ozono

Comments The 15th edition of the Language Resources and Evaluation Conference (LREC 2026); resources are available at https://doi.org/10.5281/zenodo.17395388

2510.16410 2026-03-12 cs.CV

REALM: An MLLM-Agent Framework for Open World 3D Reasoning Segmentation and Editing on Gaussian Splatting

Changyue Shi, Minghao Chen, Yiping Mao, Chuxiao Yang, Xinyuan Hu, Jiajun Ding, Zhou Yu

Comments CVPR 2026 Accepted

2510.16325 2026-03-12 cs.CV

UltraGen: Efficient Ultra-High-Resolution Image Generation with Hierarchical Local Attention

Yuyao Zhang, Yu-Wing Tai

Comments 28 pages

2510.14878 2026-03-12 cs.LG cs.AI

Predicting kernel regression learning curves from only raw data statistics

Dhruva Karkada, Joseph Turnbull, Yuxi Liu, James B. Simon

Comments Appeared in ICLR 2026

2510.13702 2026-03-12 cs.CV cs.AI

MVCustom: Multi-View Customized Diffusion via Geometric Latent Rendering and Completion

Minjung Shin, Hyunin Cho, Sooyeon Go, Jin-Hwa Kim, Youngjung Uh

Comments ICLR 2026, Project page: https://minjung-s.github.io/mvcustom

2510.13065 2026-03-12 cs.LG stat.ML

Absolute indices for determining compactness, separability and number of clusters

Adil M. Bagirov, Ramiz M. Aliguliyev, Nargiz Sultanova, Sona Taheri

Comments 25 pages, 11 figures, 9 tables

2510.08907 2026-03-12 cs.CL

Autoencoding-Free Context Compression for LLMs via Contextual Semantic Anchors

Xin Liu, Runsong Zhao, Pengcheng Huang, Xinyu Liu, Junyi Xiao, Chunyang Xiao, Tong Xiao, Shengxiang Gao, Zhengtao Yu, Jingbo Zhu

Comments 23 pages,10 figures

2510.05748 2026-03-12 cs.LG

Communication Enables Cooperation in LLM Agents: A Comparison with Curriculum-Based Approaches

Hachem Madmoun, Salem Lahlou

Comments Published in EACL 2026 - Corrected cooperation rates for two-stage communication conditions (96.7% and 100.0%, previously reported as 48.3% and 50.0% due to a denominator bug in the evaluation code). All other results unchanged

2510.03666 2026-03-12 cs.CV cs.AI

MonitorVLM:A Vision Language Framework for Safety Violation Detection in Mining Operations

Jiang Wu, Sichao Wu, Yinsong Ma, Guangyuan Yu, Haoyuan Xu, Lifang Zheng, Jingliang Duan

2510.00681 2026-03-12 cs.CV

Adaptive Event Stream Slicing for Open-Vocabulary Event-Based Object Detection via Vision-Language Knowledge Distillation

Jinchang Zhang, Zijun Li, Jiakai Lin, Guoyu Lu

2510.00379 2026-03-12 cs.LG

Composer: A Search Framework for Hybrid Neural Architecture Design

Bilge Acun, Prasoon Sinha, Newsha Ardalani, Sangmin Bae, Alicia Golden, Chien-Yu Lin, Meghana Madhyastha, Fei Sun, Neeraja J. Yadwadkar, Carole-Jean Wu

2510.00307 2026-03-12 cs.AI

BiasBusters: Uncovering and Mitigating Tool Selection Bias in Large Language Models

Thierry Blankenstein, Jialin Yu, Zixuan Li, Vassilis Plachouras, Sunando Sengupta, Philip Torr, Yarin Gal, Alasdair Paren, Adel Bibi

Comments ICLR 2026 Camera Ready

2509.26251 2026-03-12 cs.CV

Seeing Space and Motion: Enhancing Latent Actions with Geometric and Dynamic Awareness for Vision-Language-Action Models

Zhejia Cai, Yandan Yang, Xinyuan Chang, Shiyi Liang, Ronghan Chen, Feng Xiong, Mu Xu, Ruqi Huang

Comments 8 pages, correct errors, clarify details

2509.25426 2026-03-12 cs.AI cs.LG

RADAR: Reasoning-Ability and Difficulty-Aware Routing for Reasoning LLMs

Nigel Fernandez, Branislav Kveton, Ryan A. Rossi, Andrew S. Lan, Zichao Wang

Comments ICLR 2026

2509.24962 2026-03-12 cs.LG stat.ML

Overlap-Adaptive Regularization for Conditional Average Treatment Effect Estimation

Valentyn Melnychuk, Dennis Frauen, Jonas Schweisthal, Stefan Feuerriegel

2509.24483 2026-03-12 cs.LG

One-Prompt Strikes Back: Sparse Mixture of Experts for Prompt-based Continual Learning

Minh Le, Bao-Ngoc Dao, Huy Nguyen, Quyen Tran, Anh Nguyen, Nhat Ho

Comments Accepted to ICLR 2026

2509.24224 2026-03-12 cs.LG

Proposing a Framework for Machine Learning Adoption on Legacy Systems

Ashiqur Rahman, Hamed Alhoori

Comments Accepted at The First International Workshop on Resilient Artificial Intelligence for Manufacturing (ICDM'25)

2509.23719 2026-03-12 cs.CV

PD-Diag-Net: Clinical-Priors guided Network on Brain MRI for Auxiliary Diagnosis of Parkinson's Disease

Shuai Shao, Yan Wang, Shu Jiang, Shiyuan Zhao, Di Yang, Jiangtao Wang, Yutong Bai, Jianguo Zhang

2509.23610 2026-03-12 cs.SD cs.CV

Efficient Audio-Visual Speech Separation with Discrete Lip Semantics and Multi-Scale Global-Local Attention

Kai Li, Kejun Gao, Xiaolin Hu

Comments Accepted to ICLR 2026

2509.23499 2026-03-12 cs.CV cs.CL cs.LG

Multi-modal Data Spectrum: Multi-modal Datasets are Multi-dimensional

Divyam Madaan, Varshan Muhunthan, Kyunghyun Cho, Sumit Chopra

Comments Accepted to ICLR 2026. Code available at https://github.com/divyam3897/multimodal-spectrum

2509.22953 2026-03-12 cs.LG stat.ML

GDR-learners: Orthogonal Learning of Generative Models for Potential Outcomes

Valentyn Melnychuk, Stefan Feuerriegel

2509.22699 2026-03-12 cs.CL

Are you sure? Measuring models bias in content moderation through uncertainty

Alessandra Urbinati, Mirko Lai, Simona Frenda, Marco Antonio Stranisci

Comments accepted at Findings of ACL: EMNLP 2025