arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2505.20032 2026-04-30 cs.CV cs.LG cs.RO

ViTaPEs: Visuotactile Position Encodings for Cross-Modal Alignment in Multimodal Transformers

Fotios Lygerakis, Ozan Özdenizci, Elmar Rückert

详情

英文摘要

Tactile sensing provides local essential information that is complementary to visual perception, such as texture, compliance, and force. Despite recent advances in visuotactile representation learning, challenges remain in fusing these modalities and generalizing across tasks and environments without heavy reliance on pre-trained vision-language models. Moreover, existing methods do not study positional encodings, thereby overlooking the multi-stage spatial reasoning needed to capture fine-grained visuotactile correlations. We introduce ViTaPEs, a transformer-based architecture for learning task-agnostic visuotactile representations from paired vision and tactile inputs. Our key idea is a two-stage positional injection: local (modality-specific) positional encodings are added within each stream, and a global positional encoding is added on the joint token sequence immediately before attention, providing a shared positional vocabulary at the stage where cross-modal interaction occurs. We make the positional injection points explicit and conduct controlled ablations that isolate their effect before a token-wise nonlinearity versus immediately before self-attention. Experiments on multiple large-scale real-world datasets show that ViTaPEs not only surpasses state-of-the-art baselines across various recognition tasks but also demonstrates zero-shot generalization to unseen, out-of-domain scenarios. We further demonstrate the transfer-learning strength of \emph{ViTaPEs} in a robotic grasping task, where it outperforms state-of-the-art baselines in predicting grasp success. Project page: https://sites.google.com/view/vitapes

URL PDF HTML ☆

赞 0 踩 0

2505.18441 2026-04-30 cs.LG cs.MS stat.AP

DB-KSVD: Scalable Alternating Optimization for Disentangling High-Dimensional Embedding Spaces

Romeo Valentin, Sydney M. Katz, Vincent Vanhoucke, Mykel J. Kochenderfer

Comments 8 pages + 10 pages appendix. Updated with additional vision transformer experiments

2505.17342 2026-04-30 cs.LG

A Survey of Safe Reinforcement Learning and Constrained MDPs: A Technical Survey on Single-Agent and Multi-Agent Safety

Ankita Kushwaha, Kiran Ravish, Preeti Lamba, Pawan Kumar

Comments 25

2505.13963 2026-04-30 cs.CL cs.LG

Through a Compressed Lens: Investigating The Impact of Quantization on Factual Knowledge Recall

Qianli Wang, Mingyang Wang, Nils Feldhus, Simon Ostermann, Yuan Cao, Hinrich Schütze, Sebastian Möller, Vera Schmitt

Comments TrustNLP @ ACL 2026; camera-ready version

2505.11669 2026-04-30 cs.LG cs.AI

OT Score: An OT based Confidence Score for Prototype-Assisted Source Free Unsupervised Domain Adaptation

Yiming Zhang, Sitong Liu, Alex Cloninger

2505.10924 2026-04-30 cs.CL cs.AI cs.CR cs.CV cs.SE

A Survey on the Safety and Security Threats of Computer-Using Agents: JARVIS or Ultron?

Ada Chen, Yongjiang Wu, Junyuan Zhang, Jingyu Xiao, Shu Yang, Jen-tse Huang, Kun Wang, Wenxuan Wang, Shuai Wang

Comments Accepted by ACL 2026

2504.18662 2026-04-30 cs.RO cs.AI

M2R2: MultiModal Robotic Representation for Temporal Action Segmentation

Daniel Sliwowski, Dongheui Lee

Comments 8 pages, 6 figures, 2 tables

2504.13529 2026-04-30 cs.LG cs.SY eess.SY q-fin.CP q-fin.PM

Improving Bayesian Optimization for Portfolio Management with an Adaptive Scheduling

Zinuo You, John Cartlidge, Karen Elliott, Menghan Ge, Daniel Gold

Comments 5 pages, 2 figures; version of record. ICAAI 2025, 9th International Conference on Advances in Artificial Intelligence (ICAAI 2025), November 14-16, 2025, Manchester, United Kingdom. ACM, New York, NY, USA, pages 21-25. Version 4, code repository added: https://github.com/pixelhero98/TPE-AS

详情

DOI: 10.1145/3787279.3787285
Journal ref: In 2025 9th International Conference on Advances in Artificial Intelligence (ICAAI 2025), November 14-16, 2025, Manchester, United Kingdom. ACM, New York, NY, USA, pages 21-25

英文摘要

Existing black-box portfolio management systems are prevalent in the financial industry due to commercial and safety constraints, though their performance can fluctuate dramatically with changing market regimes. Evaluating these non-transparent systems is computationally expensive, as fixed budgets limit the number of possible observations. Therefore, achieving stable and sample-efficient optimization for these systems has become a critical challenge. This work presents a novel Bayesian optimization framework (TPE-AS) that improves search stability and efficiency for black-box portfolio models under these limited observation budgets. Standard Bayesian optimization, which solely maximizes expected return, can yield erratic search trajectories and misalign the surrogate model with the true objective, thereby wasting the limited evaluation budget. To mitigate these issues, we propose a weighted Lagrangian estimator that leverages an adaptive schedule and importance sampling. This estimator dynamically balances exploration and exploitation by incorporating both the maximization of model performance and the minimization of the variance of model observations. It guides the search from broad, performance-seeking exploration towards stable and desirable regions as the optimization progresses. Extensive experiments and ablation studies, which establish our proposed method as the primary approach and other configurations as baselines, demonstrate its effectiveness across four backtest settings with three distinct black-box portfolio management models.

URL PDF HTML ☆

赞 0 踩 0

2504.09925 2026-04-30 cs.CV

FLARE: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding

Zheng Liu, Mengjie Liu, Jingzhou Chen, Jingwei Xu, Bin Cui, Conghui He, Wentao Zhang

2503.23365 2026-04-30 cs.CV cs.RO

OnSiteVRU: A High-Resolution Trajectory Dataset for High-Density Vulnerable Road Users

Zhangcun Yan, Jianqiang Li, Peng Hang, Jian Sun

2503.20749 2026-04-30 cs.CL

Can LLM Agents Simulate Multi-Turn Human Behavior? Evidence from Real Online Customer Behavior Data

Yuxuan Lu, Jing Huang, Yan Han, Bingsheng Yao, Sisong Bei, Jiri Gesi, Yaochen Xie, Yisi Sang, Zheshen, Wang, Qi He, Dakuo Wang

2503.04872 2026-04-30 cs.CL cs.AI

TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation

Lin Sun, Guangxiang Zhao, Xiaoqi Jian, Yuhan Wu, Weihong Lin, Yongfu Zhu, Qilong Shi, Change Jia, Aomufei Yuan, Yuxuan Tian, Linglin Zhang, Jinzhu Wu, Junfeng Ran, Sai-er Hu, Zihan Jiang, Junting Zhou, Wenrui Liu, Xusen Xiao, Bin Cui, Tong Yang, Xiangzheng Zhang

Comments Preprint

2502.14912 2026-04-30 cs.CL cond-mat.mtrl-sci cs.LG

Semantic Embeddings of Chemical Elements for Enhanced Materials Inference and Discovery

Yunze Jia, Yuehui Xian, Yangyang Xu, Pengfei Dang, Xiangdong Ding, Jun Sun, Yumei Zhou, Dezhen Xue

Comments v2: Updated to the published version in Materials Genome Engineering Advances (2026)

2502.05907 2026-04-30 cs.RO

EvolvingAgent: Curriculum Self-evolving Agent with Continual World Model for Long-Horizon Tasks

Tongtong Feng, Xin Wang, Zekai Zhou, Ren Wang, Yuwei Zhan, Guangyao Li, Qing Li, Wenwu Zhu

2412.13421 2026-04-30 cs.SD eess.AS

Explainable Detection of Machine Generated Music and Early Systematic Evaluation

Yupei Li, Qiyang Sun, Hanqian Li, Lucia Specia, Björn W. Schuller

Comments Accepted at Scientific report

详情

DOI: 10.1038/s41598-026-42133-7
Journal ref: Sci Rep 16, 13757 (2026)

英文摘要

Machine-generated music (MGM) has become a groundbreaking innovation with wide-ranging applications, such as music therapy, personalised editing, and creative inspiration within the music industry. However, the unregulated proliferation of MGM presents considerable challenges to the entertainment, education, and arts sectors by potentially undermining the value of high-quality human compositions. Consequently, MGM detection (MGMD) is crucial for preserving the integrity of these fields. Despite its significance, MGMD domain lacks comprehensive systematic evaluation results necessary to drive meaningful progress. To address this gap, we conduct experiments on existing large-scale datasets using a range of foundational models for audio processing, establishing systematic evaluation results tailored to the MGMD task. Our selection includes traditional machine learning models, deep neural networks, Transformer-based architectures, and State space models (SSM). Recognising the inherently multimodal nature of music, which integrates both melody and lyrics, we also explore fundamental multimodal models in our experiments. Beyond providing basic binary classification outcomes, we delve deeper into model behaviour using multiple explainable Artificial Intelligence (XAI) tools, offering insights into their decision-making processes. Our analysis reveals that ResNet18 performs the best according to in-domain and out-of-domain tests. By providing a comprehensive comparison of systematic evaluation results and their interpretability, we propose several directions to inspire future research to develop more robust and effective detection methods for MGM. We provide our codes and some samples on Github repository.

URL PDF HTML ☆

赞 0 踩 0

2411.17838 2026-04-30 cs.LG

Rock the KASBA: Blazingly Fast and Accurate Time Series Clustering

Christopher Holder, Anthony Bagnall

2411.13365 2026-04-30 cs.AI cs.LG cs.RO cs.SY eess.SY

Explainable Representation of Finite-Memory Policies for POMDPs using Decision Trees

Muqsit Azeem, Debraj Chakraborty, Sudeep Kanav, Jan Kretinsky

Comments Full version of the extended abstract accepted at AAMAS 2026

2411.06498 2026-04-30 cs.AI cs.CC

Barriers to Complexity-Theoretic Proofs that "AGI" Using Machine Learning is Impossible

Michael Guerzhoy

2409.01115 2026-04-30 cs.LG

Time series classification with random convolution kernels: pooling operators and input representations matter

Mouhamadou Mansour Lo, Gildas Morvan, Mathieu Rossi, Fabrice Morganti, David Mercier

Comments v1: initial version, incorrect evaluation. v2: Method improved, evaluation corrected, title simplified. v3: Add acknowledgments. v4: text correction

2409.00557 2026-04-30 cs.CL cs.AI cs.SE

Learning to Ask: When LLM Agents Meet Unclear Instruction

Wenxuan Wang, Juluan Shi, Zixuan Ling, Yuk-Kit Chan, Chaozheng Wang, Cheryl Lee, Youliang Yuan, Jen-tse Huang, Wenxiang Jiao, Michael R. Lyu

Comments EMNLP 2025

2407.01846 2026-04-30 cs.CV

Investigating the Segment Anything Foundation Model for Mapping Smallholder Agriculture Field Boundaries Without Training Labels

Pratyush Tripathy, Kathy Baylis, Kyle Wu, Jyles Watson, Ruizhe Jiang

Comments 11 pages, 6 main figures, 7 supplementary figures

2405.13729 2026-04-30 cs.LG cs.AI cs.CV cs.GR

ComboStoc: Combinatorial Stochasticity for Diffusion Generative Models

Rui Xu, Jiepeng Wang, Hao Pan, Yang Liu, Xin Tong, Shiqing Xin, Changhe Tu, Taku Komura, Wenping Wang

Comments ACM Transactions on Graphics, SIGGRAPH 2026

2404.09688 2026-04-30 cs.RO

Neural-Geometric Tunnel Traversal: Localization-free UAV Flight with Tilted LiDARs

Lorenzo Cano, Alejandro R. Mosteo, Danilo Tardioli

2401.02458 2026-04-30 cs.LG cs.AI

Data-Centric Foundation Models in Computational Healthcare: A Survey

Yunkun Zhang, Jin Gao, Zheling Tan, Lingfeng Zhou, Kexin Ding, Mu Zhou, Shaoting Zhang, Dequan Wang

Comments Published in ACM Computing Surveys

2010.06164 2026-04-30 cs.AI

Causal Structure Learning: a Bayesian approach based on random graphs

Mauricio Gonzalez-Soto, Ivan R. Feliciano-Avelino, L. Enrique Sucar, Hugo J. Escalante Balderas

1907.11752 2026-04-30 cs.AI stat.ME

Choosing with unknown causal information: Action-outcome probabilities for decision making can be grounded in causal models

Mauricio Gonzalez Soto, David Danks, Hugo J. Escalante Balderas, L. Enrique Sucar

2604.26932 2026-04-30 math.OC cs.LG

Learning Over-Relaxation Policies for ADMM with Convergence Guarantees

Junan Lin, Paul J. Goulart, Luca Furieri

2604.26923 2026-04-30 cs.SE cs.CL

ClassEval-Pro: A Cross-Domain Benchmark for Class-Level Code Generation

Yeheng Chen, Chaoxiang Xie, Yuling Shi, Wenhao Zeng, Yongpan Wang, Hongyu Zhang, Xiaodong Gu

Comments Accepted to AIware 2026. Code and data available at https://github.com/ian-Kappa/ClassEval-Pro

2604.26903 2026-04-30 eess.SP cs.AI cs.AR cs.ET cs.SY eess.SY

Recent Advances in mm-Wave and Sub-THz/THz Oscillators for FutureG Technologies

Baktash Behmanesh, Ahmad Rezvanitabar

2604.26899 2026-04-30 eess.SY cs.RO cs.SY

Safe Navigation using Neural Radiance Fields via Reachable Sets

Omanshu Thapliyal, Malarvizhi Sankaranarayanasamy, Ravigopal Vennelakanti

Comments 5 pages, 8 figures, 2026 4th International Conference on Mechatronics, Control and Robotics (ICMCR)