arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.25499 2026-04-29 cs.LG cs.NE

EvoTSC: Evolving Feature Learning Models for Time Series Classification via Genetic Programming

Xuanhao Yang, Bing Xue, Mengjie Zhang

详情

英文摘要

Time series classification is an important analytical task across diverse domains. However, its practical application is often hindered by the scarcity of labeled data and the requirement for substantial computational resources. To address these challenges, this paper proposes EvoTSC, a novel genetic programming approach designed to automatically evolve lightweight feature learning models for time series classification. The core of EvoTSC is a carefully designed multi-layer program structure that strategically embeds diverse forms of prior expert knowledge into the evolutionary process, effectively guiding the search toward operations known to be highly effective for time series analysis. To mitigate the common overfitting problem in time series classification, a tailored Pareto tournament selection strategy is proposed to favor models that perform consistently well across varying training data subsets, promoting the discovery of highly generalizable models. Extensive experiments conducted on univariate time series classification datasets demonstrate that EvoTSC significantly outperforms eleven benchmark methods in most comparisons. Further analyses verify the contribution of each component and the resource efficiency of the evolved models.

URL PDF HTML ☆

赞 0 踩 0

2604.25498 2026-04-29 cs.SD cs.AI

SymphonyGen: 3D Hierarchical Orchestral Generation with Controllable Harmony Skeleton

Xuzheng He, Nan Nan, Zhilin Wang, Ziyue Kang, Zhuoru Mo, Ao Li, Yu Pan, Xiaobing Li, Feng Yu, Xiaohong Guan

Comments 8 pages, 4 figures

2604.25496 2026-04-29 cs.AI

Improving Zero-Shot Offline RL via Behavioral Task Sampling

Nazim Bendib, Nicolas Perrin-Gilbert, Olivier Sigaud

2604.25482 2026-04-29 cs.CL cs.AI

From World-Gen to Quest-Line: A Dependency-Driven Prompt Pipeline for Coherent RPG Generation

Dominik Borawski, Marta Szulc, Robert Chudy, Małgorzata Giedrowicz, Piotr Mironowicz

Comments 13 pages, 1 figure, 5 listings

2604.25477 2026-04-29 cs.CV cs.AI

DDA-Thinker: Decoupled Dual-Atomic Reinforcement Learning for Reasoning-Driven Image Editing

Hanqing Yang, Qiang Zhou, Yongchao Du, Sashuai Zhou, Zhibin Wang, Jun Song, Tiezheng Ge, Cheng Yu, Bo Zheng

2604.25476 2026-04-29 cs.SD cs.CL

PSP: An Interpretable Per-Dimension Accent Benchmark for Indic Text-to-Speech

Venkata Pushpak Teja Menta

Comments 8 pages, 7 tables. Companion paper to Praxy Voice (arXiv:submission id - 7506231). Code: https://github.com/praxelhq/psp-eval; Centroids: https://huggingface.co/datasets/Praxel/psp-native-centroids

详情

英文摘要

Standard text-to-speech (TTS) evaluation measures intelligibility (WER, CER) and overall naturalness (MOS, UTMOS) but does not quantify accent. A synthesiser may score well on all four yet sound non-native on features that are phonemic in the target language. For Indic languages, these features include retroflex articulation, aspiration, vowel length, and the Tamil retroflex approximant (letter zha). We present PSP, the Phoneme Substitution Profile, an interpretable, per-phonological-dimension accent benchmark for Indic TTS. PSP decomposes accent into six complementary dimensions: retroflex collapse rate (RR), aspiration fidelity (AF), vowel-length fidelity (LF), Tamil-zha fidelity (ZF), Frechet Audio Distance (FAD), and prosodic signature divergence (PSD). The first four are measured via forced alignment plus native-speaker-centroid acoustic probes over Wav2Vec2-XLS-R layer-9 embeddings; the latter two are corpus-level distributional distances. In this v1 we benchmark four commercial and open-source systems (ElevenLabs v3, Cartesia Sonic-3, Sarvam Bulbul, Indic Parler-TTS) on Hindi, Telugu, and Tamil pilot sets, with a fifth system (Praxy Voice) included on all three languages, plus an R5->R6 case study on Telugu. Three findings: (i) retroflex collapse grows monotonically with phonological difficulty Hindi < Telugu < Tamil (~1%, ~40%, ~68%); (ii) PSP ordering diverges from WER ordering -- commercial WER-leaders do not uniformly lead on retroflex or prosodic fidelity; (iii) no single system is Pareto-optimal across all six dimensions. We release native reference centroids (500 clips per language), 1000-clip embeddings for FAD, 500-clip prosodic feature matrices for PSD, 300-utterance golden sets per language, scoring code under MIT, and centroids under CC-BY. Formal MOS-correlation is deferred to v2; v1 reports five internal-consistency signals plus a native-audio sanity check.

URL PDF HTML ☆

赞 0 踩 0

2604.25472 2026-04-29 cs.AI

SciEval: A Benchmark for Automatic Evaluation of K-12 Science Instructional Materials

Zhaohui Li, Peng He, Zhiyuan Chen, Honglu Liu, Zeyuan Wang, Tingting Li, Jinjun Xiong

2604.25467 2026-04-29 cs.LG math.OC

Subspace Optimization for Efficient Federated Learning under Heterogeneous Data

Shuchen Zhu, Zhengyang Huang, Yuqi Xu, Peijin Li

2604.25466 2026-04-29 cs.CV

Generalizable Human Gaussian Splatting via Multi-view Semantic Consistency

Jingi Kim, Wonjun Kim

Comments 10 pages, 8 figures, CVPR 2026 Findings

2604.25459 2026-04-29 cs.RO

GS-Playground: A High-Throughput Photorealistic Simulator for Vision-Informed Robot Learning

Yufei Jia, Heng Zhang, Ziheng Zhang, Junzhe Wu, Mingrui Yu, Zifan Wang, Dixuan Jiang, Zheng Li, Chenyu Cao, Zhuoyuan Yu, Xun Yang, Haizhou Ge, Yuchi Zhang, Jiayuan Zhang, Zhenbiao Huang, Tianle Liu, Shenyu Chen, Jiacheng Wang, Bin Xie, Xuran Yao, Xiwa Deng, Guangyu Wang, Jinzhi Zhang, Lei Hao, Zhixing Chen, Yuxiang Chen, Anqi Wang, Hongyun Tian, Yiyi Yan, Zhanxiang Cao, Yizhou Jiang, Hanyang Shao, Yue Li, Lu Shi, Bokui Chen, Wei Sui, Hanqing Cui, Yusen Qin, Ruqi Huang, Lei Han, Tiancai Wang, Guyue Zhou

Comments Robotics: Science and Systems 2026

2604.25457 2026-04-29 cs.CV

GramSR: Visual Feature Conditioning for Diffusion-Based Super-Resolution

Fabio D'Oronzio, Federico Putamorsi, Leonardo Zini, Marcella Cornia, Lorenzo Baraldi

Comments Accepted at the 28th International Conference on Pattern Recognition

2604.25456 2026-04-29 cs.CL cs.AI

An Investigation of Linguistic Biases in LLM-Based Recommendations

Nitin Venkateswaran, Jason Ang, Deep Adhikari, Tarun Krishna Dasari

详情

英文摘要

We investigate linguistic biases in LLM-based restaurant and product recommendations given prompts varying across Southern American English (AE), Indian English (IE), and Code-Switched Hindi-English dialects, using the Yelp Open dataset (Yelp Inc., 2023) and Walmart product reviews dataset (PromptCloud,2020). We add lists of restaurant and product names balanced by cuisine type and product category to the prompts given to the LLM, and we zero-shot prompt the LLMs in a cold-start setting to select the top-20 restaurant and product recommendations from these lists for each of the dialect-varied prompts. We prompt LLMs using different list samples across 20 seeds for better generalization, and aggregate per cuisine-type and per category response counts for each seed, question/prompt, and LLM model. We run mixed-effects regression models for each model family and topic (restaurant/product) with the aggregate response counts as the dependent, and conduct likelihood ratio tests for the fixed effects with post-hoc pairwise testing of estimated marginal means differences, to investigate group-level differences in recommendation counts by model size and dialect type. Results show that dialect plays a role in the type of restaurant selected across the models tested with the mistral-small-3.1 model and both the llama-3.1 family models tested showing more sensitivity to Indian English and Code-Switched prompts. In terms of product recommendations, the llama-3.1-70B-model is particularly sensitive to Code-Switched prompts in four out of seven categories, and more beauty and home category recommendations are seen when using the Indian English and Code-Switched prompts for larger and smaller models, respectively. No broad trends are seen in the model-size based differences, with differing recommendations based on model sizes conditioned by the type of dialect.

URL PDF HTML ☆

赞 0 踩 0

2604.25452 2026-04-29 cs.CL

Benchmarking Logistic Regression, SVM, and LightGBM Against BiLSTM with Attention for Sentiment Analysis on Indonesian Product Reviews

Razin Hafid Hamdi, Ivana Margareth Hutabarat, Hanna Gresia Sinaga, Luluk Muthoharoh, Ardika Satria, Martin C. T. Manullang

Comments 6 pages, 2 figures. Benchmarking study comparing PyCaret-based machine learning models (Logistic Regression, SVM, LightGBM) with a BiLSTM+Attention model for sentiment analysis on Indonesian product reviews

2604.25448 2026-04-29 cs.CL

Navigating Global AI Regulation: A Multi-Jurisdictional Retrieval-Augmented Generation System

Courtney Ford, Ojas Rane, Susan Leavy

Comments Preprint. Accepted at PoliticalNLP Workshop, LREC 2026. 10 pages, 1 figure

2604.25444 2026-04-29 cs.CL

One Refiner to Unlock Them All: Inference-Time Reasoning Elicitation via Reinforcement Query Refinement

Yixiao Zhou, Dongzhou Cheng, zhiliang wu, Yi Yang, Yu Cheng, Hehe Fan

Comments Accepted to ACL26

2604.25441 2026-04-29 cs.SD cs.CL eess.AS

Praxy Voice: Voice-Prompt Recovery + BUPS for Commercial-Class Indic TTS from a Frozen Non-Indic Base at Zero Commercial-Training-Data Cost

Venkata Pushpak Teja Menta

Comments 9 pages, 6 figures, 6 tables. Companion paper to PSP benchmark. Code: https://github.com/praxelhq/praxy ; Model: https://huggingface.co/Praxel/praxy-voice-r6 ; Demo: https://huggingface.co/spaces/Praxel/praxy-voice-demo

2604.25435 2026-04-29 cs.AI

PI-TTA: Physics-Informed Source-Free Test-Time Adaptation for Robust Human Activity Recognition on Mobile Devices

Changyu Li, Lu Wang, Ming Lei, Jiashen Liu, Yichen Zhang, Kaishun Wu, Fei Luo

Comments 16 pages, 11 figures

2604.25427 2026-04-29 cs.CV

A Systematic Post-Train Framework for Video Generation

Zeyue Xue, Siming Fu, Jie Huang, Shuai Lu, Haoran Li, Yijun Liu, Yuming Li, Xiaoxuan He, Mengzhao Chen, Haoyang Huang, Nan Duan, Ping Luo

Comments Tech report

2604.25423 2026-04-29 cs.CL cs.AI

Do LLMs Capture Embodied Cognition and Cultural Variation? Cross-Linguistic Evidence from Demonstratives

Yu Wang, Emmanuele Chersoni, Chu-Ren Huang

Comments Accepted to ACL 2026

2604.25421 2026-04-29 cs.LG cs.AI

FED-FSTQ: Fisher-Guided Token Quantization for Communication-Efficient Federated Fine-Tuning of LLMs on Edge Devices

Changyu Li, Shuanghong Huang, Jiashen Liu, Ming Lei, Jidu Xing, Kaishun Wu, Lu Wang, Fei Luo

Comments 19 pages, 15 figures

2604.25419 2026-04-29 cs.AI

JURY-RL: Votes Propose, Proofs Dispose for Label-Free RLVR

Xinjie Chen, Biao Fu, Jing Wu, Guoxin Chen, Xinggao Liu, Dayiheng Liu, Minpeng Liao

Comments Preprint. 32 pages, 9 figures

2604.25416 2026-04-29 cs.LG

Biased Dreams: Limitations to Epistemic Uncertainty Quantification in Latent Space Models

Julia Berger, Bernd Frauenknecht, Sebastian Trimpe, Bastian Leibe

2604.25409 2026-04-29 cs.CL

Scaling Probabilistic Transformer via Efficient Cross-Scale Hyperparameter Transfer

Penghao Kuang, Haoyi Wu, Kewei Tu

2604.25408 2026-04-29 cs.CV

Beyond Fidelity: Semantic Similarity Assessment in Low-Level Image Processing

Runjie Wang, Weiling Chen, Tiesong Zhao, Chang Wen Chen

2604.25405 2026-04-29 cs.CV cs.RO

Leveraging Previous-Traversal Point Cloud Map Priors for Camera-Based 3D Object Detection and Tracking

Markus Käppeler, Özgün Çiçek, Yakov Miron, Abhinav Valada

2604.25404 2026-04-29 cs.RO

Robust Graph Matching through Semantic Relationship Generation for SLAM

David Perez-Saura, Jose Andres Millan-Romera, Miguel Fernandez-Cortizas, Holger Voos, Pascual Campoy, Jose Luis Sanchez-Lopez

Comments 7 pages, 5 figures

2604.25392 2026-04-29 cs.CL

Benchmarking PyCaret AutoML Against IndoBERT Fine-Tuning for Sentiment Analysis on Indonesian IKN Twitter Data

Mutia Alfi Mayzaroh, Dwi Fitria Ningsih, Nindi Destriani, Martin C. T. Manullang

Comments 10 pages, 5 figures, 4 tables. Presented as a benchmarking study on Indonesian sentiment analysis using PyCaret and IndoBERT

2604.25388 2026-04-29 cs.CV cs.RO

COMPASS: COmpact Multi-channel Prior-map And Scene Signature for Floor-Plan-Based Visual Localization

Muhammad Shaheer, Miguel Fernandez-Cortizas, Asier Bikandi-Noya, Holger Voos, Jose Luis Sanchez-Lopez

2604.25383 2026-04-29 cs.SD cs.AI eess.AS

ML-SAN: Multi-Level Speaker-Adaptive Network for Emotion Recognition in Conversations

Kexue Wang, Yinfeng Yu, Liejun Wang

Comments Main paper (12 pages). Accepted for publication by International Conference on Intelligent Computing 2026

详情

英文摘要

To establish empathy with machines, it is essential to fully understand human emotional changes. However, research in multimodal emotion recognition often overlooks one problem: individual expressive traits vary significantly, which means that different people may express emotions differently. In our daily lives, we can see this. When communicating with different people, some express "happiness" through their facial expressions and words, while others may hide their happiness or express it through their actions. Both are expressions of 'happiness,' but such differences in emotional expression are still too difficult for machines to distinguish. Current emotion recognition remains at a 'static' level, using a single recognition model to identify all emotional styles. This "simplification" often affects the recognition results, especially in multi-turn dialogues. To address this problem, this paper introduces a novel Multi-Level Speaker Adaptive Network (ML-SAN), which, specifically, effectively addresses the challenge of speaker identity information confusion. ML-SAN does not simply assign a speaker's ID after recognition; instead, it employs a three-stage adaptive process: First, Input-level Calibration uses Feature-Level Linear Modulation (FiLM) to adjust the raw audio and visual features into a neutral space unrelated to the speaker. Then, Interaction-level Gating re-adjusts the trust level for each modality (e.g., voice or facial features) based on the speaker's identity information. Finally, Output-level Regularization maintains the consistency of speaker features in the latent space. Tests on the MELD and IEMOCAP datasets show that our model (ML-SAN) achieves better results, performs exceptionally well in handling challenging tail sentiment categories, and better addresses the diversity of speakers in real-world scenarios.

URL PDF HTML ☆

赞 0 踩 0

2604.25379 2026-04-29 cs.LG cs.AI

Safe-Support Q-Learning: Learning without Unsafe Exploration

Yeeun Lim, Narim Jeong, Donghwan Lee

Comments 26 pages