arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.16927 2026-03-19 cs.CV eess.IV

Leveraging Large Vision Model for Multi-UAV Co-perception in Low-Altitude Wireless Networks

Yunting Xu, Jiacheng Wang, Ruichen Zhang, Changyuan Zhao, Yinqiu Liu, Dusit Niyato, Liang Yu, Haibo Zhou, Dong In Kim

详情

英文摘要

Multi-uncrewed aerial vehicle (UAV) cooperative perception has emerged as a promising paradigm for diverse low-altitude economy applications, where complementary multi-view observations are leveraged to enhance perception performance via wireless communications. However, the massive visual data generated by multiple UAVs poses significant challenges in terms of communication latency and resource efficiency. To address these challenges, this paper proposes a communication-efficient cooperative perception framework, termed Base-Station-Helped UAV (BHU), which reduces communication overhead while enhancing perception performance. Specifically, we employ a Top-K selection mechanism to identify the most informative pixels from UAV-captured RGB images, enabling sparsified visual transmission with reduced data volume and latency. The sparsified images are transmitted to a ground server via multi-user MIMO (MU-MIMO), where a Swin-large-based MaskDINO encoder extracts bird's-eye-view (BEV) features and performs cooperative feature fusion for ground vehicle perception. Furthermore, we develop a diffusion model-based deep reinforcement learning (DRL) algorithm to jointly select cooperative UAVs, sparsification ratios, and precoding matrices, achieving a balance between communication efficiency and perception utility. Simulation results on the Air-Co-Pred dataset demonstrate that, compared with traditional CNN-based BEV fusion baselines, the proposed BHU framework improves perception performance by over 5% while reducing communication overhead by 85%, providing an effective solution for multi-UAV cooperative perception under resource-constrained wireless environments.

URL PDF HTML ☆

赞 0 踩 0

2603.16926 2026-03-19 cs.SD cs.AI eess.AS

Music Source Restoration with Ensemble Separation and Targeted Reconstruction

Xinlong Deng, Yu Xia, Jie Jiang

2603.16917 2026-03-19 cs.LG

HoloByte: Continuous Hyperspherical Distillation for Tokenizer-Free Modeling

Vladimer Khasia

详情

英文摘要

Sequence modeling universally relies on discrete subword tokenization to circumvent the $\mathcal{O}(N^2)$ computational intractability of native byte-level attention. However, this heuristic quantization imposes artificial morphological boundaries, enforces vocabulary dependence, and fractures the continuity of the optimization landscape. To resolve this dichotomy, we introduce \textbf{HoloByte}: a strictly tokenizer-free framework utilizing Continuous Hyperspherical Distillation. HoloByte partitions discrete byte sequences into fixed-capacity chunks and projects them into a continuous, strictly bounded hyperspherical manifold via an invertible, dimension-preserving orthogonal rotation operator. This spatial superposition allows a macroscopic transformer to operate exclusively on compressed continuous representations, formally reducing the exact attention time complexity from $\mathcal{O}(N^2D)$ to $\mathcal{O}\left( \frac{N^2}{W^2}D + ND^2 \right)$. A localized causal micro-decoder subsequently unbinds these representations to compute exact byte-level distributions. To govern this continuous trajectory, we propose a dual-objective formulation incorporating a mathematically precise Holographic Latent Mean Squared Error, which strictly bounds the gradient and guarantees asymptotic stability. Theoretically, we derive the minimal embedding dimension $D = Ω(W \ln |\mathcal{V}|)$ required to ensure error-free discrete recovery from the continuous manifold. Empirically, under strictly matched parameter constraints, HoloByte is systematically outperforming a comparable discrete Byte-Pair Encoding (BPE) baseline. These results establish Continuous Hyperspherical Distillation as a mathematically rigorous and computationally tractable foundation for vocabulary-invariant sequence modeling. The code is available at https://github.com/VladimerKhasia/HoloByte

URL PDF HTML ☆

赞 0 踩 0

2603.16914 2026-03-19 cs.SD cs.AI cs.CL eess.AS

Quantizer-Aware Hierarchical Neural Codec Modeling for Speech Deepfake Detection

Jinyang Wu, Zihan Pan, Qiquan Zhang, Sailor Hardik Bhupendra, Soumik Mondal

Comments 5 pages, 3 figures

2603.16911 2026-03-19 cs.LG cs.AI

What on Earth is AlphaEarth? Hierarchical structure and functional interpretability for global land cover

Ivan Felipe Benavides-Martinez, Justin Guthrie, Jhon Edwin Arias, Yeison Alberto Garces-Gomez, Angela Ines Guzman-Alvis, Cristiam Victoriano Portilla-Cabrera, Somnath Mondal, Andrew J. Allyn, Auroop R. Ganguly

详情

英文摘要

Geospatial foundation models generate high-dimensional embeddings that achieve strong predictive performance, yet their internal organization remains obscure, limiting their scientific use. Recent interpretability studies relate Google AlphaEarth Foundations (GAEF) embeddings to continuous environmental variables, but it is still unclear whether the embedding space exhibits a functional or hierarchical organization, in which some dimensions act as specialized representations while others encode shared or broader geospatial structure. In this work, we propose a functional interpretability framework that reverse-engineers the role of embedding dimensions by characterizing their contribution to land cover structure from observed classification behavior. The approach combines large-scale experimentation with a structural analysis of embedding-class relationships based on feature importance patterns and progressive ablation. Our results show that embedding dimensions exhibit consistent and non-uniform functional behavior, allowing them to be categorized along a hierarchical functional spectrum: specialist dimensions associated with specific land cover classes, low- and mid-generalist dimensions capturing shared characteristics between classes, and highgeneralist dimensions reflecting broader environmental gradients. Critically, we find that accurate land cover classification (98% of baseline performance) can be achieved using as few as 2 to 12 of the 64 available dimensions, depending on the class. This demonstrates substantial redundancy in the embedding space and offers a pathway toward significant reductions in computational cost. Together, these findings reveal that AlphaEarth embeddings are not only physically informative, but also functionally organized into a hierarchical structure, providing practical guidance for dimension selection in operational classification tasks.

URL PDF HTML ☆

赞 0 踩 0

2603.16901 2026-03-19 cs.LG cs.AI

From Language to Action in Arabic: Reliable Structured Tool Calling via Data-Centric Fine-Tuning

Omer Nacar, Deema Alquffari, Saleh Alsharideh, Adeem AlOtaibi, Abdulaziz Alabdulkarim, Leen Alhazmi, Nada Alomar, Wareef Alzubaidi, Nada Alsultan, Ahmed Alrabghi, Demah Alhoshan, Rana Alsayyari, Hamed Alruwaili, Albaraa Jaafar, Khaled Alusmani, Abdulaziz Alsohimy, Munirah Alsubaie, Shahd Aldukhayil, Arwa Alali, Yazeed BinShihah, Razan Alsulaymi, Nourah Alhumaid, Razan Abdulsalam, Reem Alamoudi, Mohammed Alkhalifa

2603.16889 2026-03-19 cs.CL cs.AI cs.SD eess.AS

Rubric-Guided Fine-tuning of SpeechLLMs for Multi-Aspect, Multi-Rater L2 Reading-Speech Assessment

Aditya Kamlesh Parikh, Cristian Tejedor-Garcia, Catia Cucchiarini, Helmer Strik

Comments Accepted to LREC 2026. This publication is part of the project Responsible AI for Voice Diagnostics (RAIVD) with file number NGF.1607.22.013 of the research programme NGF AiNed Fellowship Grants, which is financed by the Dutch Research Council (NWO)

2603.16888 2026-03-19 cs.LG cs.AI

Multi-Agent Reinforcement Learning for Dynamic Pricing: Balancing Profitability,Stability and Fairness

Krishna Kumar Neelakanta Pillai Santha Kumari Amma

2603.16883 2026-03-19 cs.CV cs.CL cs.LG eess.SP

Tokenization vs. Augmentation: A Systematic Study of Writer Variance in IMU-Based Online Handwriting Recognition

Jindong Li, Dario Zanca, Vincent Christlein, Tim Hamann, Jens Barth, Peter Kämpf, Björn Eskofier

2603.16881 2026-03-19 cs.LG

Federated Multi Agent Deep Learning and Neural Networks for Advanced Distributed Sensing in Wireless Networks

Nadine Muller, Stefano DeRosa, Su Zhang, Chun Lee Huan

2603.16878 2026-03-19 cs.LG cs.AI eess.SP

A foundation model for electrodermal activity data

Leonardo Alchieri, Matteo Garzon, Lidia Alecci, Francesco Bombassei De Bona, Martin Gjoreski, Giovanni De Felice, Silvia Santini

2603.16872 2026-03-19 cs.CL cs.CY

Trust, Safety, and Accuracy: Assessing LLMs for Routine Maternity Advice

V Sai Divya, A Bhanusree, Rimjhim, K Venkata Krishna Rao

2603.16806 2026-03-19 cs.RO cs.AI

DexGrasp-Zero: A Morphology-Aligned Policy for Zero-Shot Cross-Embodiment Dexterous Grasping

Yuliang Wu, Yanhan Lin, WengKit Lao, Yuhao Lin, Yi-Lin Wei, Wei-Shi Zheng, Ancong Wu

2603.16600 2026-03-19 cs.CV

Rationale Matters: Learning Transferable Rubrics via Proxy-Guided Critique for VLM Reward Models

Weijie Qiu, Dai Guan, Junxin Wang, Zhihang Li, Yongbo Gai, Mengyu Zhou, Erchao Zhao, Xiaoxi Jiang, Guanjun Jiang

Comments 25 pages, 10 figures,

2603.16583 2026-03-19 cs.LG

Trajectory-Optimized Time Reparameterization for Learning-Compatible Reduced-Order Modeling of Stiff Dynamical Systems

Joe Standridge, Daniel Livescu, Paul Cizmas

详情

英文摘要

Stiff dynamical systems present a challenge for machine-learning reduced-order models (ML-ROMs), as explicit time integration becomes unstable in stiff regimes while implicit integration within learning loops is computationally expensive and often degrades training efficiency. Time reparameterization (TR) offers an alternative by transforming the independent variable so that rapid physical-time transients are spread over a stretched-time coordinate, enabling stable explicit integration on uniformly sampled grids. Although several TR strategies have been proposed, their effect on learnability in ML-ROMs remains incompletely understood. This work investigates time reparameterization as a stiffness-mitigation mechanism for neural ODE reduced-order modeling and introduces a trajectory-optimized TR (TOTR) formulation. The proposed approach casts time reparameterization as an optimization problem in arc-length coordinates, in which a traversal-speed profile is selected to penalize acceleration in stretched time. By targeting the smoothness of the training dynamics, this formulation produces reparameterized trajectories that are better conditioned and easier to learn than existing TR methods. TOTR is evaluated on three stiff problems: a parameterized stiff linear system, the van der Pol oscillator, and the HIRES chemical kinetics model. Across all cases, the proposed approach yields smoother reparameterizations and improved physical-time predictions under identical training regimens than other TR approaches. Quantitative results demonstrate loss reductions of one to two orders of magnitude compared to benchmark algorithms. These results highlight that effective stiffness mitigation in ML-ROMs depends critically on the regularity and learnability of the time map itself, and that optimization-based TR provides a robust framework for explicit reduced-order modeling of multiscale dynamical systems.

URL PDF HTML ☆

赞 0 踩 0

2603.16506 2026-03-19 cs.CV

VIEW2SPACE: Studying Multi-View Visual Reasoning from Sparse Observations

Fucai Ke, Zhixi Cai, Boying Li, Long Chen, Beibei Lin, Weiqing Wang, Pari Delir Haghighi, Gholamreza Haffari, Hamid Rezatofighi

2603.16448 2026-03-19 cs.AI

TRUST-SQL: Tool-Integrated Multi-Turn Reinforcement Learning for Text-to-SQL over Unknown Schemas

Ai Jian, Xiaoyun Zhang, Wanrou Du, Jingqing Ruan, Jiangbo Pei, Weipeng Zhang, Ke Zeng, Xunliang Cai

2603.16421 2026-03-19 cs.CV

HGP-Mamba: Integrating Histology and Generated Protein Features for Mamba-based Multimodal Survival Risk Prediction

Jing Dai, Chen Wu, Ming Wu, Qibin Zhang, Zexi Wu, Jingdong Zhang, Hongming Xu

Comments Accepted at IEEE ICME 2026. This arXiv version includes additional supplementary experiments and extended discussions beyond the conference version

2603.16301 2026-03-19 cs.RO

OGScene3D: Incremental Open-Vocabulary 3D Gaussian Scene Graph Mapping for Scene Understanding

Siting Zhu, Ziyun Lu, Guangming Wang, Chenguang Huang, Yongbo Chen, I-Ming Chen, Wolfram Burgard, Hesheng Wang

2603.16292 2026-03-19 cs.CL cs.AI

Attention-guided Evidence Grounding for Spoken Question Answering

Ke Yang, Bolin Chen, Yuejie Li, Yueying Hua, Jianhao Nie, Yueping He, Bowen Li, Chengjun Mao

Comments Accepted to ICME 2026

2603.16270 2026-03-19 cs.RO

MG-Grasp: Metric-Scale Geometric 6-DoF Grasping Framework with Sparse RGB Observations

Kangxu Wang, Siang Chen, Chenxing Jiang, Shaojie Shen, Yixiang Dai, Guijin Wang

Comments 8 pages, 5 figures

2603.16195 2026-03-19 cs.CV cs.RO

S-VAM: Shortcut Video-Action Model by Self-Distilling Geometric and Semantic Foresight

Haodong Yan, Zhide Zhong, Jiaguan Zhu, Junjie He, Weilin Yuan, Wenxuan Song, Xin Gong, Yingjie Cai, Guanyi Zhao, Xu Yan, Bingbing Liu, Ying-Cong Chen, Haoang Li

2603.16021 2026-03-19 cs.AI cs.HC

Interpretable Context Methodology: Folder Structure as Agentic Architecture

Jake Van Clief, David McDermott

Comments 28 pages, 5 figures, 2 tables, 54 references

2603.15936 2026-03-19 cs.CL

CTG-DB: An Ontology-Based Transformation of ClinicalTrials.gov to Enable Cross-Trial Drug Safety Analyses

Jeffery L. Painter, François Haguinet, Andrew Bate

Comments 10 pages, 2 figures. Submitted to the 2026 AMIA Annual Symposium

2603.15886 2026-03-19 cs.LG cs.AI

PhasorFlow: A Python Library for Unit Circle Based Computing

Dibakar Sigdel, Namuna Panday

2603.15797 2026-03-19 cs.LG cs.AI

OMNIFLOW: A Physics-Grounded Multimodal Agent for Generalized Scientific Reasoning

Hao Wu, Yongheng Zhang, Yuan Gao, Fan Xu, Fan Zhang, Ruobing Xie, Ruijian Gou, Yuxuan Liang, Xiaomeng Huang, Xian Wu

2603.15674 2026-03-19 cs.AI cs.IT cs.LG math.IT stat.ML

Theoretical Foundations of Latent Posterior Factors: Formal Guarantees for Multi-Evidence Reasoning

Aliyu Agboola Alege

Comments 30 pages, 8 figures, 10 tables. Theoretical characterization of the Latent Posterior Factors (LPF) framework for multi-evidence probabilistic reasoning, with formal guarantees and empirical validation

2603.15670 2026-03-19 cs.AI cs.LG

I Know What I Don't Know: Latent Posterior Factor Models for Multi-Evidence Probabilistic Reasoning

Aliyu Agboola Alege

Comments 202 pages, 52 figures, 105 tables. Comprehensive presentation of the Latent Posterior Factors (LPF) framework for multi-evidence probabilistic reasoning, including theoretical analysis, algorithmic design, and extensive empirical evaluation across synthetic and real-world benchmarks

2603.15639 2026-03-19 cs.AI

The Comprehension-Gated Agent Economy: A Robustness-First Architecture for AI Economic Agency

Rahul Baxi

Comments v2: updated correspondence email

2603.15248 2026-03-19 cs.LG cs.SY eess.SY

Mechanistic Foundations of Goal-Directed Control

Alma Lago