arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2510.03721 2026-03-31 cs.CV cs.CL cs.CY cs.LG

Person-Centric Annotations of LAION-400M: Auditing Bias and Its Transfer to Models

Leander Girrbach, Stephan Alaniz, Genevieve Smith, Trevor Darrell, Zeynep Akata

Comments ICLR 2026

详情

英文摘要

Vision-language models trained on large-scale multimodal datasets show strong demographic biases, but the role of training data in producing these biases remains unclear. A major barrier has been the lack of demographic annotations in web-scale datasets such as LAION-400M. We address this gap by creating person-centric annotations for the full dataset, including over 276 million bounding boxes, perceived gender and race/ethnicity labels, and automatically generated captions. These annotations are produced through validated automatic labeling pipelines combining object detection, multimodal captioning, and finetuned classifiers. Using them, we uncover demographic imbalances and harmful associations, such as the disproportionate linking of men and individuals perceived as Black or Middle Eastern with crime-related and negative content. We also show that a linear fit predicts 60-70% of gender bias in CLIP and Stable Diffusion from direct co-occurrences in the data. Our resources establish the first large-scale empirical link between dataset composition and downstream model bias. Code is available at https://github.com/ExplainableML/LAION-400M-Person-Centric-Annotations.

URL PDF HTML ☆

赞 0 踩 0

2509.25848 2026-03-31 cs.CV cs.AI

More Thought, Less Accuracy? On the Dual Nature of Reasoning in Vision-Language Models

Xinyu Tian, Shu Zou, Zhaoyuan Yang, Mengqi He, Fabian Waschkowski, Lukas Wesemann, Peter Tu, Jing Zhang

Comments Accepted to ICLR2026

2509.24305 2026-03-31 cs.LG cs.DC math.OC

Asynchronous Policy Gradient Aggregation for Efficient Distributed Reinforcement Learning

Alexander Tyurin, Andrei Spiridonov, Varvara Rudenko

2509.23362 2026-03-31 cs.CL cs.AI

Dual-Space Smoothness for Robust and Balanced LLM Unlearning

Han Yan, Zheyuan Liu, Meng Jiang

Comments Accepted by ICLR 2026

2509.22381 2026-03-31 cs.LG

Enhancing Credit Risk Prediction: A Multi-stage Ensemble Pipeline

Haibo Wang, Jun Huang, Lutfu S. Sua, Figen Balo, Burak Dolar

Comments 39 pages

2509.21309 2026-03-31 cs.CV

NewtonGen: Physics-Consistent and Controllable Text-to-Video Generation via Neural Newtonian Dynamics

Yu Yuan, Xijun Wang, Tharindu Wickremasinghe, Zeeshan Nadir, Bole Ma, Stanley H. Chan

Comments Accepted by ICLR 2026. Camera-ready version. Project Page: https://yuyuanspace.com/NewtonGen/

2509.18387 2026-03-31 cs.CV

BlurBall: Joint Ball and Motion Blur Estimation for Table Tennis Ball Tracking

Thomas Gossard, Filip Radovic, Andreas Ziegler, Andreas Zell

Comments Accepted to CVPRW 2026 (CVsports)

2509.17889 2026-03-31 cs.LG

GaussianPSL: Soft partitioning for complex PSL problem

Phuong Mai Dinh, Van-Nam Huynh

2509.15673 2026-03-31 cs.RO

Omni-LIVO: Robust RGB-Colored Multi-Camera Visual-Inertial-LiDAR Odometry via Photometric Migration and ESIKF Fusion

Yinong Cao, Chenyang Zhang, Xin He, Yuwei Chen, Chengyu Pu, Bingtao Wang, Kaile Wu, Shouzheng Zhu, Fei Han, Shijie Liu, Chunlai Li, Jianyu Wang

Comments Accepted by IEEE Robotics and Automation Letters (RA-L). Early Access version available. This version supersedes all previous versions and is the official accepted manuscript for citation

2509.13007 2026-03-31 cs.LG

ReTrack: Data Unlearning in Diffusion Models through Redirecting the Denoising Trajectory

Qitan Shi, Cheng Jin, Jiawei Zhang, Yuantao Gu

Comments 22 pages, 12 figures, accepted by AISTATS 2026

2509.12573 2026-03-31 cs.LG cs.HC

No Need for Learning to Defer? A Training Free Deferral Framework to Multiple Experts through Conformal Prediction

Tim Bary, Benoît Macq, Louis Petit

Comments 11 pages, 3 figures, 1 table

2509.11474 2026-03-31 cs.SD cs.IR

Acoustic Overspecification in Electronic Dance Music Taxonomy

Weilun Xu, Tianhao Dai, Oscar Goudet, Xiaoxuan Wang

2509.07704 2026-03-31 cs.CV

SEEC: Segmentation-Assisted Multi-Entropy Models for Learned Lossless Image Compression

Chunhang Zheng, Zichang Ren, Dou Li

Comments Accpeted by ICME 2026

2509.05970 2026-03-31 cs.CV

OmniStyle2: Learning to Stylize by Learning to Destylize

Ye Wang, Zili Yi, Yibo Zhang, Peng Zheng, Xuping Xie, Jiang Lin, Yijun Li, Yilin Wang, Rui Ma

Comments Our project page: https://wangyephd.github.io/projects/DeStyle/index.html

2509.02028 2026-03-31 cs.CV cs.CR

See No Evil: Adversarial Attacks Against Linguistic-Visual Association in Referring Multi-Object Tracking Systems

Halima Bouzidi, Haoyu Liu, Mohammad Abdullah Al Faruque

Comments Accepted to the NeurIPS 2025 Workshop on Reliable ML from Unreliable Data

2508.13773 2026-03-31 cs.LG cs.AI

PENGUIN: Enhancing Transformer with Periodic-Nested Group Attention for Long-term Time Series Forecasting

Tian Sun, Yuqi Chen, Weiwei Sun

2508.09428 2026-03-31 cs.CV cs.AI

What-Meets-Where: Unified Learning of Action and Contact Localization in Images

Yuxiao Wang, Yu Lei, Wolin Liang, Weiying Xue, Zhenao Wei, Nan Zhuang, Qi Liu

Comments Accepted by AAAI 2026

2508.04329 2026-03-31 cs.LG

Forgetting: A New Mechanism Towards Better Large Language Model Fine-tuning

Ali Taheri, Alireza Taban, Qizhou Wang, Shanshan Ye, Abdolreza Mirzaei, Tongliang Liu, Bo Han

2508.03100 2026-03-31 cs.CV

AVATAR: Reinforcement Learning to See, Hear, and Reason Over Video

Yogesh Kulkarni, Pooyan Fazli

Comments CVPR 2026

2508.01277 2026-03-31 cs.SD cs.LG eess.AS q-bio.QM

Foundation Models for Bioacoustics -- a Comparative Review

Raphael Schwinger, Paria Vali Zadeh, Lukas Rauch, Mats Kurz, Tom Hauschild, Sam Lapp, Sven Tomforde

Comments Preprint

2508.00947 2026-03-31 cs.RO cs.DC cs.NI

Service Discovery-Based Hybrid Network Middleware for Efficient Communication in Distributed Robotic Systems

Shiyao Sang, Yinggang Ling

Comments 8 pages, 8 figures, accepted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2025

2507.21652 2026-03-31 cs.CL

UnsafeChain: Enhancing Reasoning Model Safety via Hard Cases

Raj Vardhan Tomar, Preslav Nakov, Yuxia Wang

2507.16279 2026-03-31 cs.CV

MAN++: Scaling Momentum Auxiliary Network for Supervised Local Learning in Vision Tasks

Junhao Su, Feiyu Zhu, Hengyu Shi, Tianyang Han, Yurui Qiu, Junfeng Luo, Xiaoming Wei, Jialin Gao

Comments Accepted by TPAMI

2507.03119 2026-03-31 cs.LG cs.AI physics.plasm-ph

Improving ideal MHD equilibrium accuracy with physics-informed neural networks

Timo Thun, Andrea Merlo, Rory Conlin, Dario Panici, Daniel Böckenhoff

Comments Submitted to Nuclear Fusion, 16 pages, 7 figures

2506.21356 2026-03-31 cs.CV

ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models

Hongbo Liu, Jingwen He, Yi Jin, Dian Zheng, Yuhao Dong, Fan Zhang, Ziqi Huang, Yinan He, Yangguang Li, Weichao Chen, Yu Qiao, Wanli Ouyang, Shengjie Zhao, Ziwei Liu

2506.08391 2026-03-31 cs.CV

SECOND: Mitigating Perceptual Hallucination in Vision-Language Models via Selective and Contrastive Decoding

Woohyeon Park, Woojin Kim, Jaeik Kim, Jaeyoung Do

2506.04156 2026-03-31 cs.CL

A Dataset for Addressing Patient's Information Needs related to Clinical Course of Hospitalization

Sarvesh Soni, Dina Demner-Fushman

详情

DOI: 10.1038/s41597-026-06639-z

英文摘要

Patients have distinct information needs about their hospitalization that can be addressed using clinical evidence from electronic health records (EHRs). While artificial intelligence (AI) systems show promise in meeting these needs, robust datasets are needed to evaluate the factual accuracy and relevance of AI-generated responses. To our knowledge, no existing dataset captures patient information needs in the context of their EHRs. We introduce ArchEHR-QA, an expert-annotated dataset based on real-world patient cases from intensive care unit and emergency department settings. The cases comprise questions posed by patients to public health forums, clinician-interpreted counterparts, relevant clinical note excerpts with sentence-level relevance annotations, and clinician-authored answers. To establish benchmarks for grounded EHR question answering (QA), we evaluated three open-weight large language models (LLMs)--Llama 4, Llama 3, and Mixtral--across three prompting strategies: generating (1) answers with citations to clinical note sentences, (2) answers before citations, and (3) answers from filtered citations. We assessed performance on two dimensions: Factuality (overlap between cited note sentences and ground truth) and Relevance (textual and semantic similarity between system and reference answers). The final dataset contains 134 patient cases. The answer-first prompting approach consistently performed best, with Llama 4 achieving the highest scores. Manual error analysis supported these findings and revealed common issues such as omitted key clinical evidence and contradictory or hallucinated content. Overall, ArchEHR-QA provides a strong benchmark for developing and evaluating patient-centered EHR QA systems, underscoring the need for further progress toward generating factual and relevant responses in clinical contexts.

URL PDF HTML ☆

赞 0 踩 0

2505.24862 2026-03-31 cs.CV

ViStoryBench: Comprehensive Benchmark Suite for Story Visualization

Cailin Zhuang, Ailin Huang, Yaoqi Hu, Jingwei Wu, Wei Cheng, Jiaqi Liao, Hongyuan Wang, Xinyao Liao, Weiwei Cai, Hengyuan Xu, Xuanyang Zhang, Xianfang Zeng, Zhewei Huang, Gang Yu, Chi Zhang

Comments Accepted by CVPR 2026. 44 Pages, Project Page: https://vistorybench.github.io, Code: https://github.com/vistorybench/vistorybench, Dataset: https://huggingface.co/datasets/ViStoryBench/ViStoryBench

2505.21545 2026-03-31 cs.CV cs.LG

Corruption-Aware Training of Latent Video Diffusion Models for Robust Text-to-Video Generation

Chika Maduabuchi, Hao Chen, Yujin Han, Jindong Wang

Comments ICLR 2026 ReALM-GEN

2505.17694 2026-03-31 cs.LG

CoDec: Prefix-Shared Decoding Kernel for LLMs

Zhibin Wang, Rui Ning, Chao Fang, Zhonghui Zhang, Xi Lin, Shaobo Ma, Mo Zhou, Xue Li, Zhongfeng Wang, Chengying Huan, Rong Gu, Kun Yang, Guihai Chen, Sheng Zhong, Chen Tian