arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.27513 2026-03-31 cs.CV cs.AI

Understanding Semantic Perturbations on In-Processing Generative Image Watermarks

Anirudh Nakra, Min Wu

详情

英文摘要

The widespread deployment of high-fidelity generative models has intensified the need for reliable mechanisms for provenance and content authentication. In-processing watermarking, embedding a signature into the generative model's synthesis procedure, has been advocated as a solution and is often reported to be robust to standard post-processing (such as geometric transforms and filtering). Yet robustness to semantic manipulations that alter high-level scene content while maintaining reasonable visual quality is not well studied or understood. We introduce a simple, multi-stage framework for systematically stress-testing in-processing generative watermarks under semantic drift. The framework utilizes off-the-shelf models for object detection, mask generation, and semantically guided inpainting or regeneration to produce controlled, meaning-altering edits with minimal perceptual degradation. Based on extensive experiments on representative schemes, we find that robustness varies significantly with the degree of semantic entanglement: methods by which watermarks remain detectable under a broad suite of conventional perturbations can fail under semantic edits, with watermark detectability in many cases dropping to near zero while image quality remains high. Overall, our results reveal a critical gap in current watermarking evaluations and suggest that watermark designs and benchmarking must explicitly account for robustness against semantic manipulation.

URL PDF HTML ☆

赞 0 踩 0

2603.27510 2026-03-31 cs.LG

Decomposing Discrimination: Causal Mediation Analysis for AI-Driven Credit Decisions

Duraimurugan Rajamanickam

Comments 22 pages, 6 figures, 2 tables. Open-source code at https://github.com/rdmurugan/causalfair-repo

2603.27508 2026-03-31 cs.SD

Investigation on the Robustness of Acoustic Foundation Models on Post Exercise Speech

Xiangyuan Xue, Yuyu Wang, Ruijie Yao, Xiaoyue Ni, Xiaofan Jiang, Jingping Nie

2603.27504 2026-03-31 cs.CV

Transferring Physical Priors into Remote Sensing Segmentation via Large Language Models

Yuxi Lu, Kunqi Li, Zhidong Li, Xiaohan Su, Biao Wu, Chenya Huang, Bin Liang

2603.27500 2026-03-31 cs.CV

Streamlined Open-Vocabulary Human-Object Interaction Detection

Chang Sun, Dongliang Liao, Changxing Ding

2603.27490 2026-03-31 cs.CL cs.AI cs.MA

AgentSwing: Adaptive Parallel Context Management Routing for Long-Horizon Web Agents

Zhaopeng Feng, Liangcai Su, Zhen Zhang, Xinyu Wang, Xiaotian Zhang, Xiaobin Wang, Runnan Fang, Qi Zhang, Baixuan Li, Shihao Cai, Rui Ye, Hui Chen, Jiang Yong, Joey Tianyi Zhou, Chenxiong Qian, Pengjun Xie, Bryan Hooi, Zuozhu Liu, Jingren Zhou

2603.27488 2026-03-31 cs.LG

Variational Learning of Fractional Posteriors

Kian Ming A. Chai, Edwin V. Bonilla

Comments Initial version in Proceedings of the 42nd International Conference on Machine Learning, Vancouver, Canada. PMLR 267, 2025. This version contains a correction for Lemma A.1 and amendments to two surrounding texts: see the last page of the paper at the accompanying github website

2603.27486 2026-03-31 cs.CV stat.AP

Estimating the Impact of COVID-19 on Travel Demand in Houston Area Using Deep Learning and Satellite Imagery

Alekhya Pachika, Lu Gao, Lingguang Song, Pan Lu, Xingju Wang

2603.27482 2026-03-31 cs.CV cs.AI

Difference Feedback: Generating Multimodal Process-Level Supervision for VLM Reinforcement Learning

Feiding, Yongkang Zhang, Yuhao Liao, Zijian Zeng, Chunzheng Zhu, Yaozong Zheng, Yafei Liu, Yeling Peng, Youwei Wang, Sibo Wang, Huiming Yang, Linglin Liao, Shunzhi Yang

2603.27481 2026-03-31 cs.LG cs.AI

On Token's Dilemma: Dynamic MoE with Drift-Aware Token Assignment for Continual Learning of Large Vision Language Models

Chongyang Zhao, Mingsong Li, Haodong Lu, Dong Gong

Comments Accepted at CVPR 2026

2603.27476 2026-03-31 cs.AI cs.LG

PeopleSearchBench: A Multi-Dimensional Benchmark for Evaluating AI-Powered People Search Platforms

Wei Wang, Tianyu Shi, Shuai Zhang, Boyang Xia, Zequn Xie, Chenyu Zeng, Qi Zhang, Lynn Ai, Yaqi Yu, Kaiming Zhang, Feiyue Tang

Comments 25 pages

2603.27469 2026-03-31 cs.LG cs.AI

KV Cache Quantization for Self-Forcing Video Generation: A 33-Method Empirical Study

Suraj Ranganath, Vaishak Menon, Anish Patnaik

详情

英文摘要

Self-forcing video generation extends a short-horizon video model to longer rollouts by repeatedly feeding generated content back in as context. This scaling path immediately exposes a systems bottleneck: the key-value (KV) cache grows with rollout length, so longer videos require not only better generation quality but also substantially better memory behavior. We present a comprehensive empirical study of KV-cache compression for self-forcing video generation on a Wan2.1-based Self-Forcing stack. Our study covers 33 quantization and cache-policy variants, 610 prompt-level observations, and 63 benchmark-level summaries across two evaluation settings: MovieGen for single-shot 10-second generation and StoryEval for longer narrative-style stability. We jointly evaluate peak VRAM, runtime, realized compression ratio, VBench imaging quality, BF16-referenced fidelity (SSIM, LPIPS, PSNR), and terminal drift. Three findings are robust. First, the strongest practical operating region is a FlowCache-inspired soft-prune INT4 adaptation, which reaches 5.42-5.49x compression while reducing peak VRAM from 19.28 GB to about 11.7 GB with only modest runtime overhead. Second, the highest-fidelity compressed methods, especially PRQ_INT4 and QUAROT_KV_INT4, are not the best deployment choices because they preserve quality at severe runtime or memory cost. Third, nominal compression alone is not sufficient: several methods shrink KV storage but still exceed BF16 peak VRAM because the current integration reconstructs or retains large BF16 buffers during attention and refresh stages. The result is a benchmark harness, analysis workflow, and empirical map of which KV-cache ideas are practical today and which are promising research directions for better memory integration. Code, data products, and the presentation dashboard are available at https://github.com/suraj-ranganath/kv-quant-longhorizon/.

URL PDF HTML ☆

赞 0 踩 0

2603.27467 2026-03-31 cs.LG cs.AI

TurboAngle: Near-Lossless KV Cache Compression via Uniform Angle Quantization

Dipkumar Patel

Comments 10 pages, 7 tables, 2 figures

2603.27460 2026-03-31 cs.CV cs.AI

Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development

Zhongying Deng, Cheng Tang, Ziyan Huang, Jiashi Lin, Ying Chen, Junzhi Ning, Chenglong Ma, Jiyao Liu, Wei Li, Yinghao Zhu, Shujian Gao, Yanyan Huang, Sibo Ju, Yanzhou Su, Pengcheng Chen, Wenhao Tang, Tianbin Li, Haoyu Wang, Yuanfeng Ji, Hui Sun, Shaobo Min, Liang Peng, Feilong Tang, Haochen Xue, Rulin Zhou, Chaoyang Zhang, Wenjie Li, Shaohao Rui, Weijie Ma, Xingyue Zhao, Yibin Wang, Kun Yuan, Zhaohui Lu, Shujun Wang, Jinjie Wei, Lihao Liu, Dingkang Yang, Lin Wang, Yulong Li, Haolin Yang, Yiqing Shen, Lequan Yu, Xiaowei Hu, Yun Gu, Yicheng Wu, Benyou Wang, Minghui Zhang, Angelica I. Aviles-Rivero, Qi Gao, Hongming Shan, Xiaoyu Ren, Fang Yan, Hongyu Zhou, Haodong Duan, Maosong Cao, Shanshan Wang, Bin Fu, Xiaomeng Li, Zhi Hou, Chunfeng Song, Lei Bai, Yuan Cheng, Yuandong Pu, Xiang Li, Wenhai Wang, Hao Chen, Jiaxin Zhuang, Songyang Zhang, Huiguang He, Mengzhang Li, Bohan Zhuang, Zhian Bai, Rongshan Yu, Liansheng Wang, Yukun Zhou, Xiaosong Wang, Xin Guo, Guanbin Li, Xiangru Lin, Dakai Jin, Mianxin Liu, Wenlong Zhang, Qi Qin, Conghui He, Yuqiang Li, Ye Luo, Nanqing Dong, Jie Xu, Wenqi Shao, Bo Zhang, Qiujuan Yan, Yihao Liu, Jun Ma, Zhi Lu, Yuewen Cao, Zongwei Zhou, Jianming Liang, Shixiang Tang, Qi Duan, Dongzhan Zhou, Chen Jiang, Yuyin Zhou, Yanwu Xu, Jiancheng Yang, Shaoting Zhang, Xiaohong Liu, Siqi Luo, Yi Xin, Chaoyu Liu, Haochen Wen, Xin Chen, Alejandro Lozano, Min Woo Sun, Yuhui Zhang, Yue Yao, Xiaoxiao Sun, Serena Yeung-Levy, Xia Li, Jing Ke, Chunhui Zhang, Zongyuan Ge, Ming Hu, Jin Ye, Zhifeng Li, Yirong Chen, Yu Qiao, Junjun He

Comments 157 pages, 19 figures, 26 tables. Project repo: \url{https://github.com/uni-medical/Project-Imaging-X}

详情

英文摘要

Foundation models have demonstrated remarkable success across diverse domains and tasks, primarily due to the thrive of large-scale, diverse, and high-quality datasets. However, in the field of medical imaging, the curation and assembling of such medical datasets are highly challenging due to the reliance on clinical expertise and strict ethical and privacy constraints, resulting in a scarcity of large-scale unified medical datasets and hindering the development of powerful medical foundation models. In this work, we present the largest survey to date of medical image datasets, covering over 1,000 open-access datasets with a systematic catalog of their modalities, tasks, anatomies, annotations, limitations, and potential for integration. Our analysis exposes a landscape that is modest in scale, fragmented across narrowly scoped tasks, and unevenly distributed across organs and modalities, which in turn limits the utility of existing medical image datasets for developing versatile and robust medical foundation models. To turn fragmentation into scale, we propose a metadata-driven fusion paradigm (MDFP) that integrates public datasets with shared modalities or tasks, thereby transforming multiple small data silos into larger, more coherent resources. Building on MDFP, we release an interactive discovery portal that enables end-to-end, automated medical image dataset integration, and compile all surveyed datasets into a unified, structured table that clearly summarizes their key characteristics and provides reference links, offering the community an accessible and comprehensive repository. By charting the current terrain and offering a principled path to dataset consolidation, our survey provides a practical roadmap for scaling medical imaging corpora, supporting faster data discovery, more principled dataset creation, and more capable medical foundation models.

URL PDF HTML ☆

赞 0 踩 0

2603.27452 2026-03-31 cs.RO

Robotic Dexterous Manipulation via Anisotropic Friction Modulation using Passive Rollers

Ethan Fisk, Taeyoon Lee, Shenli Yuan

Comments 2026 IEEE International Conference on Robotics & Automation

2603.27451 2026-03-31 cs.CL cs.AI

Multi-Agent Dialectical Refinement for Enhanced Argument Classification

Jakub Bąba, Jarosław A. Chudziak

Comments Accepted for publication in the proceedings of ACIIDS 2026

2603.27449 2026-03-31 cs.CV

LOME: Learning Human-Object Manipulation with Action-Conditioned Egocentric World Model

Quankai Gao, Jiawei Yang, Qiangeng Xu, Le Chen, Yue Wang

2603.27448 2026-03-31 cs.LG cs.AI cs.CE

GIFT: Bootstrapping Image-to-CAD Program Synthesis via Geometric Feedback

Giorgio Giannone, Anna Clare Doris, Amin Heyrani Nobari, Kai Xu, Akash Srivastava, Faez Ahmed

Comments preprint

2603.27442 2026-03-31 cs.LG cs.SY eess.SY

Interpretable Physics Extraction from Data for Linear Dynamical Systems using Lie Generator Networks

Shafayeth Jamil, Rehan Kapadia

Comments 20 pages, 6 figures

2603.27441 2026-03-31 cs.CV cs.AI

Evaluating Large and Lightweight Vision Models for Irregular Component Segmentation in E-Waste Disassembly

Xinyao Zhang, Chang Liu, Xiao Liang, Minghui Zheng, Sara Behdad

Comments Accepted at ASME MSEC2026

2603.27438 2026-03-31 cs.AI

The Novelty Bottleneck: A Framework for Understanding Human Effort Scaling in AI-Assisted Work

Jacky Liang

2603.27435 2026-03-31 cs.CL cs.AI

Improving Attributed Long-form Question Answering with Intent Awareness

Xinran Zhao, Aakanksha Naik, Jay DeYoung, Joseph Chee Chang, Jena D. Hwang, Tongshuang Wu, Varsha Kishore

Comments 39 pages, 7 figures

2603.27432 2026-03-31 cs.LG cs.IT math.IT

The Geometric Cost of Normalization: Affine Bounds on the Bayesian Complexity of Neural Networks

Sungbae Chun

Comments 12 pages, 2 figures

2603.27429 2026-03-31 cs.CV

Mind the Shape Gap: A Benchmark and Baseline for Deformation-Aware 6D Pose Estimation of Agricultural Produce

Nikolas Chatzis, Angeliki Tsinouka, Katerina Papadimitriou, Niki Efthymiou, Marios Glytsos, George Retsinas, Paris Oikonomou, Gerasimos Potamianos, Petros Maragos, Panagiotis Paraskevas Filntisis

2603.27423 2026-03-31 cs.AI cs.SE

AstraAI: LLMs, Retrieval, and AST-Guided Assistance for HPC Codebases

Mahesh Natarajan, Xiaoye Li, Weiqun Zhang

Comments 10 pages, 5 figures

2603.27422 2026-03-31 cs.RO

Predictive Modeling in AUV Navigation: A Perspective from Kalman Filtering

Zizhan Tang, Yao Liu, Jessica Liu

Comments 7pages and 9 figures

2603.27417 2026-03-31 cs.LG

Kempe Swap K-Means: A Scalable Near-Optimal Solution for Semi-Supervised Clustering

Yuxuan Ren, Shijie Deng

Comments 42 pages

2603.27416 2026-03-31 cs.RO cs.AI

Agent-Driven Autonomous Reinforcement Learning Research: Iterative Policy Improvement for Quadruped Locomotion

Nimesh Khandelwal, Shakti S. Gupta

2603.27415 2026-03-31 cs.AI stat.CO

Greedy Is a Strong Default: Agents as Iterative Optimizers

Yitao Li

2603.27412 2026-03-31 cs.LG cs.AI cs.CL

The Geometry of Harmful Intent: Training-Free Anomaly Detection via Angular Deviation in LLM Residual Streams

Isaac Llorente-Saguer

Comments 20 pages, 10 figures, 3 tables. Training-free harmful-prompt detector via angular deviation in LLM residual streams. Evaluated on six Qwen variants (base / instruct / abliterated). Achieves AUROC over 0.937 (harmful-vs-normative) and 1.000 (harmful-vs-benign-aggressive) with no harmful training data