arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.23912 2026-04-28 cs.LG stat.ML

Gromov-Wasserstein Methods for Multi-View Relational Embedding and Clustering

Rafael Pereira Eufrazio, Eduardo Fernandes Montesuma, Charles Casimiro Cavalcante

Comments This manuscript is currently under review at the XLIV Simposio Brasileiro de Telecomunicacoes e Processamento de Sinais - SBrT (Brazilian Symposium on Telecommunications and Signal Processing ) 2026

2604.23909 2026-04-28 cs.CV

AMAVA: Adaptive Motion-Aware Video-to-Audio Framework for Visually-Impaired Assistance

Benjamin Klein, Kazi Ruslan Rahman, Sanchita Ghose

Comments 8 pages, 7 figures. Published in the Proceedings of the 15th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2026), pages 282--289

详情

DOI: 10.5220/0014289700004067
Journal ref: In Proceedings of the 15th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2026), pages 282--289, 2026

英文摘要

Navigational aids for blind and low vision individuals struggle conveying dynamic real-world environments, leading to cognitive overload from continuous, undifferentiated feedback. We present AMAVA, a novel real-time video-to-audio framework that converts mobile device video into contextually relevant sound effects or text-to-speech descriptions. We propose a motion-aware pipeline using a lightweight AI classification model to distinguish between low and high-movement scenes followed by a real-time text-to-audio synthesis pipeline to enhance environmental perception more efficiently. In static environments, AMAVA generates spoken audio scene descriptions for situational awareness. In high-movement situations, it prioritizes safety by delivering sound cues, such as spoken hazard alerts and environmental sound effects. These audio outputs are produced by a decoder-only transformer-based vision-language model with mixture-of-experts and cross-modal attention for visual understanding, in conjunction with neural text-to-speech and natural sound synthesis networks. The proposed framework uses prompt-based caching and category-specific throttling to avoid auditory clutter and minimize latency. We present a comprehensive evaluation of the system, including a real-time navigation study comparing a white cane alone versus with AMAVA, that shows a significant increase in user confidence and perceived safety.

URL PDF HTML ☆

赞 0 踩 0

2604.23908 2026-04-28 cs.LG cs.SY eess.SY

Machine Learning and Deep Learning Models for Short Term Electricity Price Forecasting in Australia's National Electricity Market

Wei Lu, Jay Wang, Dingli Duan, Ding Mao, Caiyi Song, John Huang

Comments 28 pages, 5 figures

2604.23902 2026-04-28 cs.AI

LLM-Augmented Traffic Signal Control with LSTM-Based Traffic State Prediction and Safety-Constrained Decision Support

Jiazhao Shi

2604.23899 2026-04-28 cs.CV cs.LG

Mammographic Lesion Segmentation with Lightweight Models: A Comparative Study

Helder Oliveira

Comments Submitted to SPIE JMI

2604.23897 2026-04-28 cs.AI econ.GN q-fin.EC

MarketBench: Evaluating AI Agents as Market Participants

Andrey Fradkin, Rohit Krishnan

2604.23888 2026-04-28 cs.LG cs.AI

Geometry Preserving Loss Functions Promote Improved Adaptation of Blackbox Generative Model

Sinjini Mitra, Constantine Kyriakakis, Shenyuan Liang, Anuj Srivastava, Pavan Turaga

2604.23877 2026-04-28 cs.CL

Knowledge Vector of Logical Reasoning in Large Language Models

Zixuan Wang, Yuanyuan Lei

Comments Accepted to ACL 2026

2604.23875 2026-04-28 cs.CV cs.AI

Risk-Aware Robust Learning: Reducing Clinical Risk under Label Noise in Medical Image Classification

Maycon R. S. Pereira, Filipe R. Cordeiro

Comments Accepted at SBCAS'26

2604.23867 2026-04-28 cs.LG

Learning Interpretable PDE Representations for Generative Reconstructions with Structured Sparsity

Valerie Tsao, Nathaniel Chaney, Manolis Veveakis

Comments 28 pages, 20 figures

2604.23863 2026-04-28 cs.RO cs.SY eess.SY

Cooptimizing Safety and Performance Using Safety Value-Constrained Model Predictive Control

Hao Wang, Nam Nguyen, Armand Jordana, Ludovic Righetti, Somil Bansal

2604.23861 2026-04-28 cs.CV cs.AI

Empirical Ablation and Ensemble Optimization of a Convolutional Neural Network for CIFAR-10 Classification

Naser Khatti Dizabadi

2604.23860 2026-04-28 cs.CV cs.AI

Exploring Audio Hallucination in Egocentric Video Understanding

Ashish Seth, Xinhao Mei, Changsheng Zhao, Varun Nagaraja, Ernie Chang, Gregory P. Meyer, Gael Le Lan, Yunyang Xiong, Vikas Chandra, Yangyang Shi, Dinesh Manocha, Zhipeng Cai

Comments Accepted to ICASSP 2026

2604.23859 2026-04-28 cs.AI

Time-Series Forecasting in Safety-Critical Environments: An EU-AI-Act-Compliant Open-Source Package / Zeitreihenprognose in sicherheitskritischen Umgebungen: Ein KI-VO-konformes Open-Source-Paket

Thomas Bartz-Beielstein, Eva Bartz

Comments Bilingual twin paper: English version first, German original below (91 pages total). Single shared bibliography

2604.23858 2026-04-28 cs.CV

Latent Inter-Frame Pruning: A Training-Free Method Bridging Traditional Video Compression and Modern Diffusion Transformers for Efficient Generation

Dennis Menn, Chih-Hsien Chou

2604.23855 2026-04-28 cs.CL cs.SE

Learning Selective LLM Autonomy from Copilot Feedback in Enterprise Customer Support Workflows

Nikita Borovkov, Elisei Rykov, Olga Tsymboi, Sergei Filimonov, Nikita Surnachev, Dmitry Bitman, Anatolii Potapov

2604.23854 2026-04-28 cs.AI

Does Machine Unlearning Preserve Clinical Safety? A Risk Analysis for Medical Image Classification

Andreza M. C. Falcao, Filipe R. Cordeiro

Comments Accepted at SBCAS'26

2604.23844 2026-04-28 cs.CL

Translate or Simplify First: An Analysis of Cross-lingual Text Simplification in English and French

Ido Dahan, Omer Toledano, Roey J. Gafter, Sharon Pardo, Oren Tsur, Hila Zahavi, Elior Sulem

2604.23842 2026-04-28 cs.CL cs.AI

Reheat Nachos for Dinner? Evaluating AI Support for Cross-Cultural Communication of Neologisms

Dayeon Ki, Yu Hou, Rachel Rudinger, Hal Daumé, Marine Carpuat, Fumeng Yang

Comments ACL 2026 Findings

2604.23839 2026-04-28 cs.CV cs.AI

Focus on What Matters: Two-Stage ROI-Aware Refinement for Anatomy-Preserving Fetal Ultrasound Reconstruction

Ines Abbes, Mahmood Alzubaidi, Mowafa Househ, Khalid Alyafei, Marco Agus, Samir Brahim Belhaouari

Comments 18 pages, 7 figures, multiple tables. Preprint submitted to arXiv

2604.23838 2026-04-28 cs.LG

JigsawRL: Assembling RL Pipelines for Efficient LLM Post-Training

Zhengding Hu, Hehua Ouyang, Chang Chen, Zaifeng Pan, Yue Guan, Zhongkai Yu, Zhen Wang, Steven Swanson, Yufei Ding

2604.23837 2026-04-28 cs.CL cs.LG

One Size Fits None: Heuristic Collapse in LLM Investment Advice

Jillian Ross, Andrew W. Lo

2604.23824 2026-04-28 cs.CL

Resource-Lean Lexicon Induction for German Dialects

Robert Litschko, Barbara Plank, Diego Frassinelli

Comments Accepted at LREC 2026

2604.23815 2026-04-28 cs.CL

DRACULA: Hunting for the Actions Users Want Deep Research Agents to Execute

Nishant Balepur, Malachi Hamada, Varsha Kishore, Sergey Feldman, Amanpreet Singh, Pao Siangliulue, Joseph Chee Chang, Rachel Rudinger, Eunsol Choi, Jordan Lee Boyd-Graber, Doug Downey, Aakanksha Naik

Comments In-progress Preprint

2604.23814 2026-04-28 cs.CV cs.AI

Mapping License Plate Recoverability Under Extreme Viewing Angles for Oppor-tunistic Urban Sensing

Igor Adamenko, Orpaz Ben Aharon, Yehudit Aperstein, Alexander Apartsin

Comments 18 pages, 8 figures

2604.23813 2026-04-28 cs.CV cs.CL

ShredBench: Evaluating the Semantic Reasoning Capabilities of Multimodal LLMs in Document Reconstruction

Zichun Guo, Yuling Shi, Wenhao Zeng, Chao Hu, Haotian Lin, Terry Yue Zhuo, Jiawei Chen, Xiaodong Gu, Wenping Ma

Comments ACL 2026 Findings. Code available at https://github.com/ythere-y/ShredBench

2604.23809 2026-04-28 cs.CL

LegalDrill: Diagnosis-Driven Synthesis for Legal Reasoning in Small Language Models

Tianchun Li, Haochen Liu, Vishwa Pardeshi, Xingchen Wang, Tianci Liu, Huijun Zhao, Wei Fan, Jing Gao

Comments ACL 2026 Industry Track

2604.23806 2026-04-28 cs.LG cs.AI

Symmetric Equilibrium Propagation for Thermodynamic Diffusion Training

Aditi De

2604.23804 2026-04-28 cs.LG

Reparameterization through Coverings and Topological Weight Priors

Maxim Beketov, Pavel Snopov

2604.23803 2026-04-28 cs.CV

Bringing a Personal Point of View: Evaluating Dynamic 3D Gaussian Splatting for Egocentric Scene Reconstruction

Jan Warchocki, Xi Wang, Jonas Kulhanek, Jan van Gemert

Comments Accepted at the EgoVis Workshop at CVPR 2026