arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.02468 2026-04-06 cs.CV cs.AI

Hierarchical, Interpretable, Label-Free Concept Bottleneck Model

Haodong Xie, Yujun Cai, Rahul Singh Maharjan, Yiwei Wang, Federico Tavella, Angelo Cangelosi

详情

英文摘要

Concept Bottleneck Models (CBMs) introduce interpretability to black-box deep learning models by predicting labels through human-understandable concepts. However, unlike humans, who identify objects at different levels of abstraction using both general and specific features, existing CBMs operate at a single semantic level in both concept and label space. We propose HIL-CBM, a Hierarchical Interpretable Label-Free Concept Bottleneck Model that extends CBMs into a hierarchical framework to enhance interpretability by more closely mirroring the human cognitive process. HIL-CBM enables classification and explanation across multiple semantic levels without requiring relational concept annotations. HIL-CBM aligns the abstraction level of concept-based explanations with that of model predictions, progressing from abstract to concrete. This is achieved by (i) introducing a gradient-based visual consistency loss that encourages abstraction layers to focus on similar spatial regions, and (ii) training dual classification heads, each operating on feature concepts at different abstraction levels. Experiments on benchmark datasets demonstrate that HIL-CBM outperforms state-of-the-art sparse CBMs in classification accuracy. Human evaluations further show that HIL-CBM provides more interpretable and accurate explanations, while maintaining a hierarchical and label-free approach to feature concepts.

URL PDF HTML ☆

赞 0 踩 0

2604.02459 2026-04-06 cs.LG cs.AI cs.CL

On the Geometric Structure of Layer Updates in Deep Language Models

Jun-Sik Yoo

Comments 11 pages, 5 figures

2604.02457 2026-04-06 cs.CV cs.CR

Street-Legal Physical-World Adversarial Rim for License Plates

Nikhil Kalidasu, Sahana Ganapathy

Comments 20 pages, 8 figures, 5 tables, submitted to Security in Machine Learning Applications 2026

2604.02451 2026-04-06 cs.CL cs.AI

Skeleton-based Coherence Modeling in Narratives

Nishit Asnani, Rohan Badlani

2604.02450 2026-04-06 cs.LG cs.AI cs.CL

Do We Need Frontier Models to Verify Mathematical Proofs?

Aaditya Naik, Guruprerana Shabadi, Rajeev Alur, Mayur Naik

Comments 21 pages, 11 figures

2604.02447 2026-04-06 cs.CV cs.AI cs.LG

PlayGen-MoG: Framework for Diverse Multi-Agent Play Generation via Mixture-of-Gaussians Trajectory Prediction

Kevin Song

Comments 9 pages, 4 figures, 2 tables. Accepted to CVPRW 2026

2604.02446 2026-04-06 cs.CV cs.AI

From Elevation Maps To Contour Lines: SVM and Decision Trees to Detect Violin Width Reduction

Philémon Beghin, Anne-Emmanuelle Ceulemans, François Glineur

Comments Paper accepted for the Florence Heri-Tech 2026 Conference

2604.02441 2026-04-06 cs.RO

Adaptive Learned State Estimation based on KalmanNet

Arian Mehrfard, Bharanidhar Duraisamy, Stefan Haag, Florian Geiss, Mirko Mählisch

2604.02434 2026-04-06 cs.AI

Compositional Neuro-Symbolic Reasoning

Anugyan Das, Omkar Ghugarkar, Vishvesh Bhat, Asad Aali

2604.02430 2026-04-06 cs.LG cs.AI

Self-Directed Task Identification

Timothy Gould, Sidike Paheding

Comments 9 pages, 3 figures, 3 tables, 17 equations

2604.02423 2026-04-06 cs.CL cs.CY

SWAY: A Counterfactual Computational Linguistic Approach to Measuring and Mitigating Sycophancy

Joy Bhalla, Kristina Gligorić

2604.02409 2026-04-06 cs.CV cs.AI

LumiVideo: An Intelligent Agentic System for Video Color Grading

Yuchen Guo, Junli Gong, Hongmin Cai, Yiu-ming Cheung, Weifeng Su

2604.02401 2026-04-06 cs.RO cs.SY eess.SY

Backup-Based Safety Filters: A Comparative Review of Backup CBF, Model Predictive Shielding, and gatekeeper

Taekyung Kim, Aswin D. Menon, Akshunn Trivedi, Dimitra Panagou

Comments Project page: https://www.taekyung.me/backup-safety-filters

2604.02397 2026-04-06 cs.CV cs.AI

Variational Encoder--Multi-Decoder (VE-MD) for Privacy-by-functional-design (Group) Emotion Recognition

Anderson Augusma, Dominique Vaufreydaz, Fédérique Letué

详情

英文摘要

Group Emotion Recognition (GER) aims to infer collective affect in social environments such as classrooms, crowds, and public events. Many existing approaches rely on explicit individual-level processing, including cropped faces, person tracking, or per-person feature extraction, which makes the analysis pipeline person-centric and raises privacy concerns in deployment scenarios where only group-level understanding is needed. This research proposes VE-MD, a Variational Encoder-Multi-Decoder framework for group emotion recognition under a privacy-aware functional design. Rather than providing formal anonymization or cryptographic privacy guarantees, VE-MD is designed to avoid explicit individual monitoring by constraining the model to predict only aggregate group-level affect, without identity recognition or per-person emotion outputs. VE-MD learns a shared latent representation jointly optimized for emotion classification and internal prediction of body and facial structural representations. Two structural decoding strategies are investigated: a transformer-based PersonQuery decoder and a dense Heatmap decoder that naturally accommodates variable group sizes. Experiments on six in-the-wild datasets, including two GER and four Individual Emotion Recognition (IER) benchmarks, show that structural supervision consistently improves representation learning. More importantly, the results reveal a clear distinction between GER and IER: optimizing the latent space alone is often insufficient for GER because it tends to attenuate interaction-related cues, whereas preserving explicit structural outputs improves collective affect inference. In contrast, projected structural representations seem to act as an effective denoising bottleneck for IER. VE-MD achieves state-of-the-art performance on GAF-3.0 (up to 90.06%) and VGAF (82.25% with multimodal fusion with audio). These results show that preserving interaction-related structural information is particularly beneficial for group-level affect modeling without relying on prior individual feature extraction. On IER datasets using multimodal fusion with audio modality, VE-MD outperforms SOTA on SamSemo (77.9%, adding text modality) while achieving competitive performances on MER-MULTI (63.8%), DFEW (70.7%) and EngageNet (69.0).

URL PDF HTML ☆

赞 0 踩 0

2604.02396 2026-04-06 cs.CV cs.AI

Environment-Aware Channel Prediction for Vehicular Communications: A Multimodal Visual Feature Fusion Framework

Xuejian Zhang, Ruisi He, Minseok Kim, Inocent Calist, Mi Yang, Ziyi Qi

Comments 13 pages, 14 figures

2604.02392 2026-04-06 cs.CV

Beyond Fixed Inference: Quantitative Flow Matching for Adaptive Image Denoising

Jigang Duan, Genwei Ma, Xu Jiang, Wenfeng Xu, Ping Yang, Xing Zhao

2604.02391 2026-04-06 cs.SD cs.AI eess.AS

Reliability-Aware Geometric Fusion for Robust Audio-Visual Navigation

Teng Liu, Yinfeng Yu

Comments Main paper (6 pages). Accepted for publication by the International Joint Conference on Neural Networks (IJCNN 2026)

2604.02390 2026-04-06 cs.SD cs.AI eess.AS

Spatial-Aware Conditioned Fusion for Audio-Visual Navigation

Shaohang Wu, Yinfeng Yu

Comments Main paper (6 pages). Accepted for publication by the International Joint Conference on Neural Networks (IJCNN 2026)

2604.02389 2026-04-06 cs.SD cs.AI eess.AS

Audio Spatially-Guided Fusion for Audio-Visual Navigation

Xinyu Zhou, Yinfeng Yu

Comments Main paper (6 pages). Accepted for publication by the International Joint Conference on Neural Networks (IJCNN 2026)

2604.02371 2026-04-06 cs.CV cs.AI cs.CL

Internalized Reasoning for Long-Context Visual Document Understanding

Austin Veselka

Comments 9 pages

2604.02362 2026-04-06 cs.CL cs.AI cs.SD

CIPHER: Conformer-based Inference of Phonemes from High-density EEG

Varshith Madishetty

2604.02359 2026-04-06 cs.CL cs.AI

Using LLM-as-a-Judge/Jury to Advance Scalable, Clinically-Validated Safety Evaluations of Model Responses to Users Demonstrating Psychosis

May Lynn Reese, Markela Zeneli, Mindy Ng, Jacob Haimes, Andreea Damien, Elizabeth Stade

Comments published at IASEAI 2026, preliminary work presented at GenAI4Health workshop at NeurIPS 2025

2604.02355 2026-04-06 cs.LG cs.CV

From Broad Exploration to Stable Synthesis: Entropy-Guided Optimization for Autoregressive Image Generation

Han Song, Yucheng Zhou, Jianbing Shen, Yu Cheng

2604.02353 2026-04-06 cs.LG cs.AI

Prism: Policy Reuse via Interpretable Strategy Mapping in Reinforcement Learning

Thomas Pravetz

Comments 13 pages, 3 figures, 5 tables

2604.02352 2026-04-06 cs.LG cs.AI cs.SE

An Initial Exploration of Contrastive Prompt Tuning to Generate Energy-Efficient Code

Sophie Weidmann, Fernando Castor

Comments Published at the Third International Workshop on Large Language Models for Code (LLM4Code 2026)

2604.02351 2026-04-06 cs.LG

Modeling and Controlling Deployment Reliability under Temporal Distribution Shift

Naimur Rahman, Naazreen Tabassum

Comments 19 pages, 5 figures, 7 tables. Empirical study on temporally indexed credit-risk dataset (1.35M samples, 2007-2018)

2604.02350 2026-04-06 cs.LG cs.AI

Differentiable Symbolic Planning: A Neural Architecture for Constraint Reasoning with Learned Feasibility

Venkatakrishna Reddy Oruganti

Comments 12 pages, 4 figures, 7 tables

2604.02349 2026-04-06 cs.LG cs.AI

OPRIDE: Offline Preference-based Reinforcement Learning via In-Dataset Exploration

Yiqin Yang, Hao Hu, Yihuan Mao, Jin Zhang, Chengjie Wu, Yuhua Jiang, Xu Yang, Runpeng Xie, Yi Fan, Bo Liu, Yang Gao, Bo Xu, Chongjie Zhang

2604.02348 2026-04-06 cs.LG

Contextual Intelligence The Next Leap for Reinforcement Learning

André Biedenkapp

Comments Accepted to AAMAS 2025 (Blue Sky Ideas Track)

2604.02347 2026-04-06 cs.LG

FTimeXer: Frequency-aware Time-series Transformer with Exogenous variables for Robust Carbon Footprint Forecasting

Qingzhong Li, Yue Hu, Zhou Long, Qingchang Ma, Hui Ma, Jinhai Sa

Comments Accepted by The 5th International Conference on Electronics Technology and Artificial Intelligence (ETAI 2026)