arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.07371 2026-03-12 cs.LG cs.AI stat.AP stat.ME stat.ML

ConfHit: Conformal Generative Design with Oracle Free Guarantees

Siddhartha Laghuvarapu, Ying Jin, Jimeng Sun

Comments Accepted at ICLR 2026

详情

英文摘要

The success of deep generative models in scientific discovery requires not only the ability to generate novel candidates but also reliable guarantees that these candidates indeed satisfy desired properties. Recent conformal-prediction methods offer a path to such guarantees, but its application to generative modeling in drug discovery is limited by budget constraints, lack of oracle access, and distribution shift. To this end, we introduce ConfHit, a distribution-free framework that provides validity guarantees under these conditions. ConfHit formalizes two central questions: (i) Certification: whether a generated batch can be guaranteed to contain at least one hit with a user-specified confidence level, and (ii) Design: whether the generation can be refined to a compact set without weakening this guarantee. ConfHit leverages weighted exchangeability between historical and generated samples to eliminate the need for an experimental oracle, constructs multiple-sample density-ratio weighted conformal p-value to quantify statistical confidence in hits, and proposes a nested testing procedure to certify and refine candidate sets of multiple generated samples while maintaining statistical guarantees. Across representative generative molecule design tasks and a broad range of methods, ConfHit consistently delivers valid coverage guarantees at multiple confidence levels while maintaining compact certified sets, establishing a principled and reliable framework for generative modeling.

URL PDF HTML ☆

赞 0 踩 0

2603.07181 2026-03-12 cs.CV

FreeFly-Thinking : Aligning Chain-of-Thought Reasoning with Continuous UAV Navigation

Jiaxu Zhou, Shaobo Wang, Zhiyuan Yang, Zhenjun Yu, Tao Li

Comments 10 pages, 5 figures, ECCV review

2603.05621 2026-03-12 cs.RO cs.AI cs.CL cs.LG cs.MA

RACAS: Controlling Diverse Robots With a Single Agentic System

Dylan R. Ashley, Jan Przepióra, Yimeng Chen, Ali Abualsaud, Nurzhan Yesmagambet, Shinkyu Park, Eric Feron, Jürgen Schmidhuber

Comments 7 pages in main text + 1 page of appendices + 1 page of references, 5 figures in main text + 1 figure in appendices, 2 tables in main text; source code available at https://github.com/janprz11/robot-agnostic-control

2603.03920 2026-03-12 cs.LG cs.AI

BD-Merging: Bias-Aware Dynamic Model Merging with Evidence-Guided Contrastive Learning

Yuhan Xie, Chen Lyu

Comments Accepted by CVPR 2026

2603.03507 2026-03-12 cs.LG cond-mat.dis-nn q-bio.NC stat.ML

Solving adversarial examples requires solving exponential misalignment

Alessandro Salvatore, Stanislav Fort, Surya Ganguli

2603.03203 2026-03-12 cs.AI cs.CL

No Memorization, No Detection: Output Distribution-Based Contamination Detection in Small Language Models

Omer Sela

Comments Code available at https://github.com/Sela-Omer/Contamination-Detection-Small-LM

2603.02873 2026-03-12 cs.CL

LaTeX Compilation: Challenges in the Era of LLMs

Tianyou Liu, Ziqiang Li, Xurui Liu, Yu Wu, Yansong Li

Comments 25 pages, 12 figures

2603.02816 2026-03-12 cs.CV cs.AI

BrandFusion: A Multi-Agent Framework for Seamless Brand Integration in Text-to-Video Generation

Zihao Zhu, Ruotong Wang, Siwei Lyu, Min Zhang, Baoyuan Wu

2603.01914 2026-03-12 cs.CL

AdaPonderLM: Gated Pondering Language Models with Token-Wise Adaptive Depth

Shixiang Song, He Li, Zitong Wang, Boyi Zeng, Feichen Song, Yixuan Wang, Zhiqin John Xu, Ziwei He, Zhouhan Lin

2603.01630 2026-03-12 cs.AI stat.AP

SEED-SET: Scalable Evolving Experimental Design for System-level Ethical Testing

Anjali Parashar, Yingke Li, Eric Yang Yu, Fei Chen, James Neidhoefer, Devesh Upadhyay, Chuchu Fan

Comments 10 main pages along with Appendix containing additional results, manuscript accepted in ICLR 2026

2603.01620 2026-03-12 cs.AI

ToolRLA: Multiplicative Reward Decomposition for Tool-Integrated Agents

Pengbo Liu

2603.01607 2026-03-12 cs.AI cs.LG

CARE: Towards Clinical Accountability in Multi-Modal Medical Reasoning with an Evidence-Grounded Agentic Framework

Yuexi Du, Jinglu Wang, Shujie Liu, Nicha C. Dvornek, Yan Lu

Comments Accepted by ICLR 2026

2603.00359 2026-03-12 cs.CL cs.LG

How Large Language Models Get Stuck: Early structure with persistent errors

Alokesh Manna, William Snyder, Whitney Tabor

2602.23529 2026-03-12 cs.LG

Active Value Querying to Minimize Additive Error in Subadditive Set Function Learning

Martin Černý, David Sychrovský, Filip Úradník, Jakub Černý

2602.22740 2026-03-12 cs.CV cs.AI

AMLRIS: Alignment-aware Masked Learning for Referring Image Segmentation

Tongfei Chen, Shuo Yang, Yuguang Yang, Linlin Yang, Runtang Guo, Changbai Li, He Long, Chunyu Xie, Dawei Leng, Baochang Zhang

Comments ICLR 2026 conference paper

2602.21987 2026-03-12 cs.CV cs.AI

PatchDenoiser: Parameter-efficient multi-scale patch learning and fusion denoiser for Low-dose CT imaging

Jitindra Fartiyal, Pedro Freire, Sergei K. Turitsyn, Sergei G. Solovski

2602.19442 2026-03-12 cs.CV

UrbanAlign: Post-hoc Semantic Calibration for VLM-Human Preference Alignment

Yecheng Zhang, Rong Zhao, Zhizhou Sha, Yong Li, Lei Wang, Ce Hou, Wen Ji, Hao Huang, Yunshan Wan, Jian Yu, Junhao Xia, Yuru Zhang, Chunlei Shi

Comments 26 pages

2602.18867 2026-03-12 cs.CV

Similarity-as-Evidence: Calibrating Overconfident VLMs for Interpretable and Label-Efficient Medical Active Learning

Zhuofan Xie, Zishan Lin, Jinliang Lin, Jie Qi, Shaohua Hong, Shuo Li

Comments Accepted to CVPR 2026 (to appear)

2602.18710 2026-03-12 cs.AI cs.LG

Many AI Analysts, One Dataset: Navigating the Agentic Data Science Multiverse

Martin Bertran, Riccardo Fogliato, Zhiwei Steven Wu

详情

英文摘要

Empirical conclusions depend not only on data but on analytic decisions made throughout the research process. Many-analyst studies have quantified this dependence: independent teams testing the same hypothesis on the same dataset regularly reach conflicting conclusions. But such studies require costly human coordination and are rarely conducted. We show that fully autonomous AI analysts built on large language models (LLMs) can, cheaply and at scale, replicate the structured analytic diversity observed in human multi-analyst studies. In our framework, each AI analyst independently executes a complete analysis pipeline on a fixed dataset and hypothesis; a separate AI auditor screens every run for methodological validity. Across three datasets spanning distinct domains, AI analyst-produced analyses exhibit substantial dispersion in effect sizes, $p$-values, and conclusions. This dispersion can be traced to identifiable analytic choices in preprocessing, model specification, and inference that vary systematically across LLM and persona conditions. Critically, the outcomes are \emph{steerable}: reassigning the analyst persona or LLM shifts the distribution of results even among methodologically sound runs. These results highlight a central challenge for AI-automated empirical science: when defensible analyses are cheap to generate, evidence becomes abundant and vulnerable to selective reporting. Yet the same capability that creates this risk may also help address it: treating analyst results as distributions makes analytic uncertainty visible, and deploying AI analysts against a published specification can reveal how much disagreement stems from underspecified design choices. Taken together, our results motivate a new transparency norm: AI-generated analyses should be accompanied by multiverse-style reporting and full disclosure of the prompts used, on par with code and data.

URL PDF HTML ☆

赞 0 踩 0

2602.17312 2026-03-12 cs.LG cs.SY eess.SY

LexiSafe: Offline Safe Reinforcement Learning with Lexicographic Safety-Reward Hierarchy

Hsin-Jung Yang, Zhanhong Jiang, Prajwal Koirala, Qisai Liu, Cody Fleming, Soumik Sarkar

Comments 17th ACM/IEEE International Conference on Cyber-Physical Systems

2602.14552 2026-03-12 cs.CV

OmniVTON++: Training-Free Universal Virtual Try-On with Principal Pose Guidance

Zhaotong Yang, Yong Du, Shengfeng He, Yuhui Li, Xinzhe Li, Yangyang Xu, Junyu Dong, Jian Yang

2602.14482 2026-03-12 cs.CV cs.AI

TikArt: Stabilizing Aperture-Guided Fine-Grained Visual Reasoning with Reinforcement Learning

Hao Ding, Zhichuan Yang, Weijie Ge, Ziqin Gao, Chaoyi Lu, Lei Zhao

2602.14178 2026-03-12 cs.CV cs.AI

UniWeTok: An Unified Binary Tokenizer with Codebook Size $\mathit{2^{128}}$ for Unified Multimodal Large Language Model

Shaobin Zhuang, Yuang Ai, Jiaming Han, Weijia Mao, Xiaohui Li, Fangyikang Wang, Xiao Wang, Yan Li, Shanchuan Lin, Kun Xu, Zhenheng Yang, Huaibo Huang, Xiangyu Yue, Hao Chen, Yali Wang

Comments 29 pages, 9 figures, 33 tables

2602.12566 2026-03-12 cs.AI

To Mix or To Merge: Toward Multi-Domain Reinforcement Learning for Large Language Models

Haoqing Wang, Xiang Long, Ziheng Li, Yilong Xu, Tingguang Li, Yehui Tang

2602.10048 2026-03-12 cs.LG cs.AI

Long Chain-of-Thought Compression via Fine-Grained Group Policy Optimization

Xinchen Han, Hossam Afifi, Michel Marot, Xilu Wang, Lu Yin

Comments IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2026

2602.09375 2026-03-12 cs.LG

Latent Poincaré Shaping for Agentic Reinforcement Learning

Hanchen Xia, Baoyou Chen, Zelin Zang, Yutang Ge, Guojiang Zhao, Siyu Zhu

2602.04746 2026-03-12 cs.RO

Dull, Dirty, Dangerous: Understanding the Past, Present, and Future of a Key Motivation for Robotics

Nozomi Nakajima, Pedro Reynolds-Cuéllar, Caitrin Lynch, Kate Darling

2602.04268 2026-03-12 cs.CV

KVSmooth: Mitigating Hallucination in Multi-modal Large Language Models through Key-Value Smoothing

Siyu Jiang, Feiyang Chen, Xiaojin Zhang, Kun He

Comments Accepted by CVPR 2026

2602.04102 2026-03-12 cs.CV cs.AI

DMS2F-HAD: A Dual-branch Mamba-based Spatial-Spectral Fusion Network for Hyperspectral Anomaly Detection

Aayushma Pant, Lakpa Tamang, Tsz-Kwan Lee, Sunil Aryal

Comments This paper has been accepted in the WACV 2025 conference in algorithm track

2602.02685 2026-03-12 cs.LG

Expert-Data Alignment Governs Generation Quality in Decentralized Diffusion Models

Marcos Villagra, Bidhan Roy, Raihan Seraj, Zhiying Jiang

Comments 15 pages, 4 figures. DeLTa@ICLR2026 and Sci4DL@ICLR2026