arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.11783 2026-03-13 cs.CV cs.AI

HELM: Hierarchical and Explicit Label Modeling with Graph Learning for Multi-Label Image Classification

Marjan Stoimchev, Boshko Koloski, Jurica Levatić, Dragi Kocev, Sašo Džeroski

Comments Accepted and presented at REO workshop at EurIPS 2025

2603.11781 2026-03-13 cs.AI cs.CL cs.MA

From Debate to Deliberation: Structured Collective Reasoning with Typed Epistemic Acts

Sunil Prakash

Comments 26 pages, 6 tables, 2 figures, 2 listings

2603.11780 2026-03-13 cs.CL

Large Language Models for Biomedical Article Classification

Jakub Proboszcz, Paweł Cichosz

Comments 63 pages, 25 tables, 4 figures

2603.11778 2026-03-13 cs.CL

Trust Oriented Explainable AI for Fake News Detection

Krzysztof Siwek, Daniel Stankowski, Maciej Stodolski

Comments 9 pages, 4 figures, 2 tables

2603.11772 2026-03-13 cs.CL

Legal-DC: Benchmarking Retrieval-Augmented Generation for Legal Documents

Yaocong Li, Qiang Lan, Leihan Zhang, Le Zhang

Comments 20 pages, 4 figures, to be submitted to a conference/journal

2603.11770 2026-03-13 cs.AI cs.CL

An Automatic Text Classification Method Based on Hierarchical Taxonomies, Neural Networks and Document Embedding: The NETHIC Tool

Luigi Lomasto, Rosario Di Florio, Andrea Ciapetti, Giuseppe Miscione, Giulia Ruggiero, Daniele Toti

Comments ICEIS 2019 Conference

2603.11767 2026-03-13 cs.AI

Understanding Wikidata Qualifiers: An Analysis and Taxonomy

Gilles Falquet, Sahar Aljalbout

2603.11764 2026-03-13 cs.LG stat.ML

A Further Efficient Algorithm with Best-of-Both-Worlds Guarantees for $m$-Set Semi-Bandit Problem

Botao Chen, Jongyeong Lee, Chansoo Kim, Junya Honda

2603.11757 2026-03-13 cs.LG cs.AI stat.ML

Exploiting Expertise of Non-Expert and Diverse Agents in Social Bandit Learning: A Free Energy Approach

Erfan Mirzaei, Seyed Pooya Shariatpanahi, Alireza Tavakoli, Reshad Hosseini, Majid Nili Ahmadabadi

2603.11756 2026-03-13 cs.AI cs.LG

Anomaly detection in time-series via inductive biases in the latent space of conditional normalizing flows

David Baumgartner, Eliezer de Souza da Silva, Iñigo Urteaga

2603.11745 2026-03-13 cs.AI cs.LG

CINDI: Conditional Imputation and Noisy Data Integrity with Flows in Power Grid Data

David Baumgartner, Helge Langseth, Heri Ramampiaro

2603.11743 2026-03-13 cs.CL

Semi-Synthetic Parallel Data for Translation Quality Estimation: A Case Study of Dataset Building for an Under-Resourced Language Pair

Assaf Siani, Anna Kernerman, Ilan Kernerman

2603.11736 2026-03-13 cs.AI

Gender Bias in Generative AI-assisted Recruitment Processes

Martina Ullasci, Marco Rondina, Riccardo Coppola, Antonio Vetrò

Comments 4 pages, 4 figures

2603.11734 2026-03-13 cs.CV

VTEdit-Bench: A Comprehensive Benchmark for Multi-Reference Image Editing Models in Virtual Try-On

Xiaoye Liang, Zhiyuan Qu, Mingye Zou, Jiaxin Liu, Lai Jiang, Mai Xu, Yiheng Zhu

2603.11725 2026-03-13 cs.CV cs.LG

Cross-Resolution Attention Network for High-Resolution PM2.5 Prediction

Ammar Kheder, Helmi Toropainen, Wenqing Peng, Samuel Antão, Zhi-Song Liu, Michael Boy

2603.11709 2026-03-13 cs.AI

Scaling Laws for Educational AI Agents

Mengsong Wu, Hao Hao, Shuzhen Bi, Keqian Li, Wentao Liu, Siyu Song, Hongbo Zhao, Aimin Zhou

Comments 19 pages, 6 figures, 3 tables, 1 algorithm

2603.11695 2026-03-13 cs.CV cond-mat.mtrl-sci

PolyCrysDiff: Controllable Generation of Three-Dimensional Computable Polycrystalline Material Structures

Chi Chen, Tianle Jiang, Xiaodong Wei, Yanming Wang

2603.11691 2026-03-13 cs.AI

STAIRS-Former: Spatio-Temporal Attention with Interleaved Recursive Structure Transformer for Offline Multi-task Multi-agent Reinforcement Learning

Jiwon Jeon, Myungsik Cho, Youngchul Sung

2603.11686 2026-03-13 cs.CL

In the LLM era, Word Sense Induction remains unsolved

Anna Mosolova, Marie Candito, Carlos Ramisch

Comments Accepted at ACL 2025 (Findings)

2603.11683 2026-03-13 cs.SD cs.AI cs.LG

Causal Prosody Mediation for Text-to-Speech:Counterfactual Training of Duration, Pitch, and Energy in FastSpeech2

Suvendu Sekhar Mohanty

详情

英文摘要

We propose a novel causal prosody mediation framework for expressive text-to-speech (TTS) synthesis. Our approach augments the FastSpeech2 architecture with explicit emotion conditioning and introduces counterfactual training objectives to disentangle emotional prosody from linguistic content. By formulating a structural causal model of how text (content), emotion, and speaker jointly influence prosody (duration, pitch, energy) and ultimately the speech waveform, we derive two complementary loss terms: an Indirect Path Constraint (IPC) to enforce that emotion affects speech only through prosody, and a Counterfactual Prosody Constraint (CPC) to encourage distinct prosody patterns for different emotions. The resulting model is trained on multi-speaker emotional corpora (LibriTTS, EmoV-DB, VCTK) with a combined objective that includes standard spectrogram reconstruction and variance prediction losses alongside our causal losses. In evaluations on expressive speech synthesis, our method achieves significantly improved prosody manipulation and emotion rendering, with higher mean opinion scores (MOS) and emotion accuracy than baseline FastSpeech2 variants. We also observe better intelligibility (low WER) and speaker consistency when transferring emotions across speakers. Extensive ablations confirm that the causal objectives successfully separate prosody attribution, yielding an interpretable model that allows controlled counterfactual prosody editing (e.g. "same utterance, different emotion") without compromising naturalness. We discuss the implications for identifiability in prosody modeling and outline limitations such as the assumption that emotion effects are fully captured by pitch, duration, and energy. Our work demonstrates how integrating causal learning principles into TTS can improve controllability and expressiveness in generated speech.

URL PDF HTML ☆

赞 0 踩 0

2603.11682 2026-03-13 cs.LG cs.AI

Entropy-Preserving Reinforcement Learning

Aleksei Petrenko, Ben Lipkin, Kevin Chen, Erik Wijmans, Marco Cusumano-Towner, Raja Giryes, Philipp Krähenbühl

Comments Published at ICLR 2026

2603.11675 2026-03-13 cs.CV

PROMO: Promptable Outfitting for Efficient High-Fidelity Virtual Try-On

Haohua Chen, Tianze Zhou, Wei Zhu, Runqi Wang, Yandong Guan, Dejia Song, Yibo Chen, Xu Tang, Yao Hu, Lu Sheng, Zhiyong Wu

Comments CVPR 2026

2603.11661 2026-03-13 cs.SD

Resonate: Reinforcing Text-to-Audio Generation via Online Feedback from Large Audio Language Models

Xiquan Li, Junxi Liu, Wenxi Chen, Haina Zhu, Ziyang Ma, Xie Chen

2603.11659 2026-03-13 cs.CV

FL-MedSegBench: A Comprehensive Benchmark for Federated Learning on Medical Image Segmentation

Meilu Zhu, Zhiwei Wang, Axiu Mao, Yuxing Li, Xiaohan Xing, Yixuan Yuan, Edmund Y. Lam

Comments 19 pages,4 figures

2603.11658 2026-03-13 cs.RO

Coupling Tensor Trains with Graph of Convex Sets: Effective Compression, Exploration, and Planning in the C-Space

Gerhard Reinerth, Riddhiman Laha, Marcello Romano

Comments 8 pages, 10 figures, accepted paper for ICRA2026

2603.11650 2026-03-13 cs.CL

QChunker: Learning Question-Aware Text Chunking for Domain RAG via Multi-Agent Debate

Jihao Zhao, Daixuan Li, Pengfei Li, Shuaishuai Zu, Biao Qin, Hongyan Liu

2603.11644 2026-03-13 cs.CV cs.AI

IDRL: An Individual-Aware Multimodal Depression-Related Representation Learning Framework for Depression Diagnosis

Chongxiao Wang, Junjie Liang, Peng Cao, Jinzhu Yang, Osmar R. Zaiane

2603.11640 2026-03-13 cs.CV cs.AI

Tokenization Allows Multimodal Large Language Models to Understand, Generate and Edit Architectural Floor Plans

Sizhong Qin, Ramon Elias Weber, Xinzheng Lu

Comments 20 pages, 9 figures. Accepted to CVPR 2026

2603.11638 2026-03-13 cs.RO

Learn Structure, Adapt on the Fly: Multi-Scale Residual Learning and Online Adaptation for Aerial Manipulators

Samaksh Ujjawal, Naveen Sudheer Nair, Shivansh Pratap Singh, Rishabh Dev Yadav, Wei Pan, Spandan Roy

2603.11634 2026-03-13 cs.RO

Diversity You Can Actually Measure: A Fast, Model-Free Diversity Metric for Robotics Datasets

Sreevardhan Sirigiri, Nathan Samuel de Lara, Christopher Agia, Florian Shkurti, Fabio Ramos