arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.10724 2026-03-12 cs.CV

eLasmobranc Dataset: An Image Dataset for Elasmobranch Species Recognition and Biodiversity Monitoring

Ismael Beviá-Ballesteros, Mario Jerez-Tallón, Nieves Aranda-Garrido, Isabel Abel-Abellán, Irene Antón-Linares, Jorge Azorín-López, Marcelo Saval-Calvo, Andres Fuster-Guilló, Francisca Giménez-Casalduero

Comments 9 pages, 6 figures, 5 tables. A future extended version of this work will be submitted to Scientific Data

详情

英文摘要

Elasmobranch populations are experiencing significant global declines, and several species are currently classified as threatened. Reliable monitoring and species-level identification are essential to support conservation and spatial planning initiatives such as Important Shark and Ray Areas (ISRAs). However, existing visual datasets are predominantly detection-oriented, underwater-acquired, or limited to coarse-grained categories, restricting their applicability to fine-grained morphological classification. We present the eLasmobranc Dataset, a curated and publicly available image collection from seven ecologically relevant elasmobranch species inhabiting the eastern Spanish Mediterranean coast, a region where two ISRAs have been identified. Images were obtained through dedicated data collection, including field campaigns and collaborations with local fish markets and projects, as well as from open-access public sources. The dataset was constructed predominantly from images acquired outside the aquatic environment under standardized protocols to ensure clear visualization of diagnostic morphological traits. It integrates expert-validated species annotations, structured spatial and temporal metadata, and complementary species-level information. The eLasmobranc Dataset is specifically designed to support supervised species-level classification, population studies, and the development of artificial intelligence systems for biodiversity monitoring. By combining morphological clarity, taxonomic reliability, and public accessibility, the dataset addresses a critical gap in fine-grained elasmobranch identification and promotes reproducible research in conservation-oriented computer vision. The dataset is publicly available at https://zenodo.org/records/18549737.

URL PDF HTML ☆

赞 0 踩 0

2603.10715 2026-03-12 cs.RO

ASTER: Attitude-aware Suspended-payload Quadrotor Traversal via Efficient Reinforcement Learning

Dongcheng Cao, Jin Zhou, Shuo Li

2603.10714 2026-03-12 cs.RO

MAVEN: A Meta-Reinforcement Learning Framework for Varying-Dynamics Expertise in Agile Quadrotor Maneuvers

Jin Zhou, Dongcheng Cao, Xian Wang, Shuo Li

2603.10712 2026-03-12 cs.RO

FutureVLA: Joint Visuomotor Prediction for Vision-Language-Action Model

Xiaoxu Xu, Hao Li, Jinhui Ye, Yilun Chen, Jia Zeng, Xinyi Chen, Linning Xu, Dahua Lin, Weixin Li, Jiangmiao Pang

2603.10705 2026-03-12 cs.CL

Prism-$Δ$: Differential Subspace Steering for Prompt Highlighting in Large Language Models

Yuyao Ge, Shenghua Liu, Yiwei Wang, Tianyu Liu, Baolong Bi, Lingrui Mei, Jiayu Yao, Jiafeng Guo, Xueqi Cheng

Comments 21 pages, 14 figures

2603.10703 2026-03-12 cs.CV cs.CY

WalkGPT: Grounded Vision-Language Conversation with Depth-Aware Segmentation for Pedestrian Navigation

Rafi Ibn Sultan, Hui Zhu, Xiangyu Zhou, Chengyin Li, Prashant Khanduri, Marco Brocanelli, Dongxiao Zhu

Comments Accepted by CVPR-2026

2603.10702 2026-03-12 cs.CV

UniCom: Unified Multimodal Modeling via Compressed Continuous Semantic Representations

Yaqi Zhao, Wang Lin, Zijian Zhang, Miles Yang, Jingyuan Chen, Wentao Zhang, Zhao Zhong, Liefeng Bo

2603.10701 2026-03-12 cs.SD cs.AI

AlphaFlowTSE: One-Step Generative Target Speaker Extraction via Conditional AlphaFlow

Duojia Li, Shuhan Zhang, Zihan Qian, Wenxuan Wu, Shuai Wang, Qingyang Hong, Lin Li, Haizhou Li

Comments Submitted to Interspeech 2026 for review

2603.10695 2026-03-12 cs.CV cs.AI

RandMark: On Random Watermarking of Visual Foundation Models

Anna Chistyakova, Mikhail Pautov

2603.10694 2026-03-12 cs.CV

Bioinspired CNNs for border completion in occluded images

Catarina P. Coutinho, Aneeqa Merhab, Janko Petkovic, Ferdinando Zanchetta, Rita Fioresi

Comments Submitted for Publication

2603.10682 2026-03-12 cs.RO

OnFly: Onboard Zero-Shot Aerial Vision-Language Navigation toward Safety and Efficiency

Guiyong Zheng, Yueting Ban, Mingjie Zhang, Juepeng Zheng, Boyu Zhou

2603.10678 2026-03-12 cs.LG

Surrogate models for nuclear fusion with parametric Shallow Recurrent Decoder Networks: applications to magnetohydrodynamics

M. Lo Verso, C. Introini, E. Cervi, L. Savoldi, J. N. Kutz, A. Cammi

详情

英文摘要

Magnetohydrodynamic (MHD) effects play a key role in the design and operation of nuclear fusion systems, where electrically conducting fluids (such as liquid metals or molten salts in reactor blankets) interact with magnetic fields of varying intensity and orientation, which affect the resulting flow. The numerical resolution of MHD models involves highly nonlinear multiphysics systems of equations and can become computationally expensive, particularly in multi-query, parametric, or real-time contexts. This work investigates a fully data-driven framework for MHD state reconstruction that combines dimensionality reduction via Singular Value Decomposition (SVD) with the SHallow REcurrent Decoder (SHRED), a neural network architecture designed to recover the full spatio-temporal state from sparse time-series measurements of a limited number of observables. The methodology is applied to a parametric MHD test case involving compressible lead-lithium flow in a stepped channel subjected to thermal gradients and magnetic fields spanning a broad range of intensities. To improve efficiency, the full-order dataset is first compressed using SVD, yielding a reduced representation used as reference truth for training. Only temperature measurements from three sensors are provided as input, while the network reconstructs the full fields of velocity, pressure, and temperature. To assess robustness with respect to sensor placement, thirty randomly generated sensor configurations are tested in ensemble mode. Results show that SHRED accurately reconstructs the full MHD state even for magnetic field intensities not included in the training set. These findings demonstrate the potential of SHRED as a computationally efficient surrogate modeling strategy for fusion-relevant multiphysics problems, enabling low-cost state estimation with possible applications in real-time monitoring and control.

URL PDF HTML ☆

赞 0 踩 0

2603.10677 2026-03-12 cs.AI cs.CL

Emulating Clinician Cognition via Self-Evolving Deep Clinical Research

Ruiyang Ren, Yuhao Wang, Yunsen Liang, Lan Luo, Jing Liu, Haifeng Wang, Cong Feng, Yinan Zhang, Chunyan Miao, Ji-Rong Wen, Wayne Xin Zhao

2603.10675 2026-03-12 cs.RO

Cybo-Waiter: A Physical Agentic Framework for Humanoid Whole-Body Locomotion-Manipulation

Peng Ren, Haoyang Ge, Chuan Qi, Cong Huang, Hong Li, Jiang Zhao, Pei Chi, Kai Chen

2603.10670 2026-03-12 cs.RO cs.SY eess.SY

Dynamic Modeling and Attitude Control of a Reaction-Wheel-Based Low-Gravity Bipedal Hopper

Shriram Hari, M Venkata Sai Nikhil, R Prasanth Kumar

Comments Preprint. Under review

2603.10661 2026-03-12 cs.AI cs.LG

FAME: Formal Abstract Minimal Explanation for Neural Networks

Ryma Boumazouza, Raya Elsaleh, Melanie Ducoffe, Shahaf Bassan, Guy Katz

2603.10660 2026-03-12 cs.RO

STM32-Based Smart Waste Bin for Hygienic Disposal Using Embedded Sensing and Automated Control

Mohammed Aman Bhuiyan, Aritra Islam Saswato, Md. Misbah Khan, Anish Paul, Ahmed Faizul Haque Dhrubo, Mohammad Abdul Qayum

Comments This paper consists of 6 pages, with 3 figures, 3 tables, and 1 algorithm

2603.10640 2026-03-12 cs.CL

Making Bielik LLM Reason (Better): A Field Report

Adam Trybus, Bartosz Bartnicki, Remigiusz Kinas

2603.10638 2026-03-12 cs.CV

Splat2Real: Novel-view Scaling for Physical AI with 3D Gaussian Splatting

Hansol Lim, Jongseong Brad Choi

2603.10624 2026-03-12 cs.LG cs.AI cs.CL

Reinforcement Learning with Conditional Expectation Reward

Changyi Xiao, Caijun Xu, Yixin Cao

2603.10616 2026-03-12 cs.RO

AdaClearGrasp: Learning Adaptive Clearing for Zero-Shot Robust Dexterous Grasping in Densely Cluttered Environments

Zixuan Chen, Wenquan Zhang, Jing Fang, Ruiming Zeng, Zhixuan Xu, Yiwen Hou, Xinke Wang, Jieqi Shi, Jing Huo, Yang Gao

Comments 12 pages. Under review

详情

英文摘要

In densely cluttered environments, physical interference, visual occlusions, and unstable contacts often cause direct dexterous grasping to fail, while aggressive singulation strategies may compromise safety. Enabling robots to adaptively decide whether to clear surrounding objects or directly grasp the target is therefore crucial for robust manipulation. We propose AdaClearGrasp, a closed-loop decision-execution framework for adaptive clearing and zero-shot dexterous grasping in densely cluttered environments. The framework formulates manipulation as a controllable high-level decision process that determines whether to directly grasp the target or first clear surrounding objects. A pretrained vision-language model (VLM) interprets visual observations and language task descriptions to reason about grasp interference and generate a high-level planning skeleton, which invokes structured atomic skills through a unified action interface. For dexterous grasping, we train a reinforcement learning policy with a relative hand-object distance representation, enabling zero-shot generalization across diverse object geometries and physical properties. During execution, visual feedback monitors outcomes and triggers replanning upon failures, forming a closed-loop correction mechanism. To evaluate language-conditioned dexterous grasping in clutter, we introduce Clutter-Bench, the first simulation benchmark with graded clutter complexity. It includes seven target objects across three clutter levels, yielding 210 task scenarios. We further perform sim-to-real experiments on three objects under three clutter levels (18 scenarios). Results demonstrate that AdaClearGrasp significantly improves grasp success rates in densely cluttered environments. For more videos and code, please visit our project website: https://chenzixuan99.github.io/adaclear-grasp.github.io/.

URL PDF HTML ☆

赞 0 踩 0

2603.10613 2026-03-12 cs.CL cs.CV

MUNIChus: Multilingual News Image Captioning Benchmark

Yuji Chen, Alistair Plum, Hansi Hettiarachchi, Diptesh Kanojia, Saroj Basnet, Marcos Zampieri, Tharindu Ranasinghe

Comments Accepted to LREC 2026 (The Fifteenth biennial Language Resources and Evaluation Conference)

2603.10609 2026-03-12 cs.RO

Learning Bimanual Cloth Manipulation with Vision-based Tactile Sensing via Single Robotic Arm

Dongmyoung Lee, Wei Chen, Xiaoshuai Chen, Rui Zong, Petar Kormushev

Comments 11 pages, 13 figures

2603.10600 2026-03-12 cs.AI cs.DB cs.IR

Trajectory-Informed Memory Generation for Self-Improving Agent Systems

Gaodan Fang, Vatche Isahagian, K. R. Jayaram, Ritesh Kumar, Vinod Muthusamy, Punleuk Oum, Gegi Thomas

2603.10597 2026-03-12 cs.RO cs.AI

Recover to Predict: Progressive Retrospective Learning for Variable-Length Trajectory Prediction

Hao Zhou, Lu Qi, Jason Li, Jie Zhang, Yi Liu, Xu Yang, Mingyu Fan, Fei Luo

Comments Paper is accepted by CVPR 2026

2603.10592 2026-03-12 cs.LG cs.AI

Gradient Flow Drifting: Generative Modeling via Wasserstein Gradient Flows of KDE-Approximated Divergences

Jiarui Cao, Zixuan Wei, Yuxin Liu

2603.10588 2026-03-12 cs.AI cs.CL cs.LG

Does LLM Alignment Really Need Diversity? An Empirical Study of Adapting RLVR Methods for Moral Reasoning

Zhaowei Zhang, Xiaohan Liu, Xuekai Zhu, Junchao Huang, Ceyao Zhang, Zhiyuan Feng, Yaodong Yang, Xiaoyuan Yi, Xing Xie

2603.10587 2026-03-12 cs.SD

Distilling LLM Semantic Priors into Encoder-Only Multi-Talker ASR with Talker-Count Routing

Hao Shi, Yusuke Fujita, Roman Koshkin, Mengjie Zhao, Yuan Gao, Lianbo Liu, Yui Sudo

2603.10583 2026-03-12 cs.CV

Attribution as Retrieval: Model-Agnostic AI-Generated Image Attribution

Hongsong Wang, Renxi Cheng, Chaolei Han, Jie Gui

Comments To appear in CVPR 2026, Code is at https://github.com/hongsong-wang/LIDA

2603.10582 2026-03-12 cs.LG

HAPEns: Hardware-Aware Post-Hoc Ensembling for Tabular Data

Jannis Maier, Lennart Purucker

Comments 10 pages (7 Appendix), 15 figures