arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.25646 2026-03-27 cs.RO cs.AI cs.HC

A Mentalistic Interface for Probing Folk-Psychological Attribution to Non-Humanoid Robots

Giulio Pisaneschi, Pierpaolo Serio, Estelle Gerbier, Andrea Dan Ryals, Lorenzo Pollini, Mario G. C. A. Cimino

Comments Preprint submitted to IEEE. 8 pages, 21 figures

2603.25636 2026-03-27 cs.CV

Designing Any Imaging System from Natural Language: Agent-Constrained Composition over a Finite Primitive Basis

Chengshuai Yang

Comments 28 pages, 7 figures, 8 tables, includes Supplementary Information (sections S1-S6)

2603.25635 2026-03-27 cs.LG physics.ao-ph

Anchored-Branched Steady-state WInd Flow Transformer (AB-SWIFT): a metamodel for 3D atmospheric flow in urban environments

Armand de Villeroché, Rem-Sophia Mouradi, Vincent Le Guen, Sibo Cheng, Marc Bocquet, Alban Farchi, Patrick Armand, Patrick Massin

2603.25633 2026-03-27 cs.AI

Is Mathematical Problem-Solving Expertise in Large Language Models Associated with Assessment Performance?

Liang Zhang, Yu Fu, Xinyi Jin

2603.25629 2026-03-27 cs.CV cs.LG

LanteRn: Latent Visual Structured Reasoning

André G. Viveiros, Nuno Gonçalves, Matthias Lindemann, André Martins

2603.25623 2026-03-27 cs.RO

Accurate Surface and Reflectance Modelling from 3D Radar Data with Neural Radiance Fields

Judith Treffler, Vladimír Kubelka, Henrik Andreasson, Martin Magnusson

2603.25614 2026-03-27 cs.LG

Social Hippocampus Memory Learning

Liping Yi, Zhiming Zhao, Qinghua Hu

2603.25613 2026-03-27 cs.CV cs.AI

Demographic Fairness in Multimodal LLMs: A Benchmark of Gender and Ethnicity Bias in Face Verification

Ünsal Öztürk, Hatef Otroshi Shahreza, Sébastien Marcel

Comments Accepted in CVPR 2026 workshops

2603.25607 2026-03-27 cs.CV cs.AI

DeepFAN, a transformer-based deep learning model for human-artificial intelligence collaborative assessment of incidental pulmonary nodules in CT scans: a multi-reader, multi-case trial

Zhenchen Zhu, Ge Hu, Weixiong Tan, Kai Gao, Chao Sun, Zhen Zhou, Kepei Xu, Wei Han, Meixia Shang, Xiaoming Qiu, Yiqing Tan, Jinhua Wang, Zhoumeng Ying, Li Peng, Wei Song, Lan Song, Zhengyu Jin, Nan Hong, Yizhou Yu

Comments 28 pages for main text and 37 pages for supplementary information, 7 figures in main text and 9 figures in supplementary information

2603.25597 2026-03-27 cs.LG nlin.AO

Spatiotemporal System Forecasting with Irregular Time Steps via Masked Autoencoder

Kewei Zhu, Yanze Xin, Jinwei Hu, Xiaoyuan Cheng, Yiming Yang, Sibo Cheng

2603.25583 2026-03-27 cs.RO

Towards Generalizable Robotic Data Flywheel: High-Dimensional Factorization and Composition

Yuyang Xiao, Yifei Zhou, Haoran Wang, Wenxuan Ou, Yuxiao Liu

2603.25580 2026-03-27 cs.CV

UNIC: Neural Garment Deformation Field for Real-time Clothed Character Animation

Chengfeng Zhao, Junbo Qi, Yulou Liu, Zhiyang Dou, Minchen Li, Taku Komura, Ziwei Liu, Wenping Wang, Yuan Liu

Comments Project page: https://igl-hkust.github.io/UNIC/

2603.25573 2026-03-27 cs.CV cs.LG

Hierarchy-Guided Multimodal Representation Learning for Taxonomic Inference

Sk Miraj Ahmed, Xi Yu, Yunqi Li, Yuewei Lin, Wei Xu

Comments Accepted at the ICLR 2026 Workshop on Foundation Models for Science (FM4Science)

2603.25565 2026-03-27 cs.CV

GeoHeight-Bench: Towards Height-Aware Multimodal Reasoning in Remote Sensing

Xuran Hu, Zhitong Xiong, Zhongcheng Hong, Yifang Ban, Xiaoxiang Zhu, Wufan Zhao

Comments 18 pages, 4 figures

2603.25555 2026-03-27 cs.CV

Towards Comprehensive Real-Time Scene Understanding in Ophthalmic Surgery through Multimodal Image Fusion

Nikolo Rohrmoser, Ghazal Ghazaei, Michael Sommersperger, Nassir Navab

详情

英文摘要

Purpose: The integration of multimodal imaging into operating rooms paves the way for comprehensive surgical scene understanding. In ophthalmic surgery, by now, two complementary imaging modalities are available: operating microscope (OPMI) imaging and real-time intraoperative optical coherence tomography (iOCT). This first work toward temporal OPMI and iOCT feature fusion demonstrates the potential of multimodal image processing for multi-head prediction through the example of precise instrument tracking in vitreoretinal surgery. Methods: We propose a multimodal, temporal, real-time capable network architecture to perform joint instrument detection, keypoint localization, and tool-tissue distance estimation. Our network design integrates a cross-attention fusion module to merge OPMI and iOCT image features, which are efficiently extracted via a YoloNAS and a CNN encoder, respectively. Furthermore, a region-based recurrent module leverages temporal coherence. Results: Our experiments demonstrate reliable instrument localization and keypoint detection (95.79% mAP50) and show that the incorporation of iOCT significantly improves tool-tissue distance estimation, while achieving real-time processing rates of 22.5 ms per frame. Especially for close distances to the retina (below 1 mm), the distance estimation accuracy improved from 284 $μm$ (OPMI only) to 33 $μm$ (multimodal). Conclusion: Feature fusion of multimodal imaging can enhance multi-task prediction accuracy compared to single-modality processing and real-time processing performance can be achieved through tailored network design. While our results demonstrate the potential of multi-modal processing for image-guided vitreoretinal surgery, they also underline key challenges that motivate future research toward more reliable, consistent, and comprehensive surgical scene understanding.

URL PDF HTML ☆

赞 0 踩 0

2603.25544 2026-03-27 cs.RO

Towards Embodied AI with MuscleMimic: Unlocking full-body musculoskeletal motor learning at scale

Chengkun Li, Cheryl Wang, Bianca Ziliotto, Merkourios Simos, Jozsef Kovecses, Guillaume Durandau, Alexander Mathis

2603.25539 2026-03-27 cs.CV

PAWS: Perception of Articulation in the Wild at Scale from Egocentric Videos

Yihao Wang, Yang Miao, Wenshuai Zhao, Wenyan Yang, Zihan Wang, Joni Pajarinen, Luc Van Gool, Danda Pani Paudel, Juho Kannala, Xi Wang, Arno Solin

Comments 32 pages, 13 figures. Project page: https://aaltoml.github.io/PAWS/

2603.25537 2026-03-27 cs.CL

Humans vs Vision-Language Models: A Unified Measure of Narrative Coherence

Nikolai Ilinykh, Hyewon Jang, Shalom Lappin, Asad Sayeed, Sharid Loáiciga

Comments 9 pages of content, 1 page of appendices, 9 tables, 3 figures

2603.25535 2026-03-27 cs.CV cs.LG

Insights on back marking for the automated identification of animals

David Brunner, Marie Bordes, Elisabeth Mayrhuber, Stephan M. Winkler, Viktoria Dorfer, Maciej Oczak

2603.25533 2026-03-27 cs.CV

BFMD: A Full-Match Badminton Dense Dataset for Dense Shot Captioning

Ning Ding, Keisuke Fujii, Toru Tamaki

Comments CVSports2026 accepted

2603.25524 2026-03-27 cs.CV cs.AI

CHIRP dataset: towards long-term, individual-level, behavioral monitoring of bird populations in the wild

Alex Hoi Hang Chan, Neha Singhal, Onur Kocahan, Andrea Meltzer, Saverio Lubrano, Miyako H. Warrington, Michel Griesser, Fumihiro Kano, Hemal Naik

Comments 8 pages, 4 figures

2603.25510 2026-03-27 cs.CV cs.AI cs.LG eess.IV

Challenges in Hyperspectral Imaging for Autonomous Driving: The HSI-Drive Case

Koldo Basterretxea, Jon Gutiérrez-Zaballa, Javier Echanobe

2603.25502 2026-03-27 cs.CV

RealRestorer: Towards Generalizable Real-World Image Restoration with Large-Scale Image Editing Models

Yufeng Yang, Xianfang Zeng, Zhangqi Jiang, Fukun Yin, Jianzhuang Liu, Wei Cheng, jinghong lan, Shiyu Liu, Yuqi Peng, Gang YU, Shifeng Chen

Comments 27 pages, 15 figures, Project homepage: https://yfyang007.github.io/RealRestorer/

2603.25501 2026-03-27 cs.CL

An Experimental Comparison of the Most Popular Approaches to Fake News Detection

Pietro Dell'Oglio, Alessandro Bondielli, Francesco Marcelloni, Lucia C. Passaro

详情

DOI: 10.1016/j.ins.2026.123407
Journal ref: Dell'Oglio, P., Bondielli, A., Marcelloni, F., and Passaro, L. C. (2026). An experimental comparison of the most popular approaches to fake news detection. Information Sciences, Article 123407

英文摘要

In recent years, fake news detection has received increasing attention in public debate and scientific research. Despite advances in detection techniques, the production and spread of false information have become more sophisticated, driven by Large Language Models (LLMs) and the amplification power of social media. We present a critical assessment of 12 representative fake news detection approaches, spanning traditional machine learning, deep learning, transformers, and specialized cross-domain architectures. We evaluate these methods on 10 publicly available datasets differing in genre, source, topic, and labeling rationale. We address text-only English fake news detection as a binary classification task by harmonizing labels into "Real" and "Fake" to ensure a consistent evaluation protocol. We acknowledge that label semantics vary across datasets and that harmonization inevitably removes such semantic nuances. Each dataset is treated as a distinct domain. We conduct in-domain, multi-domain and cross-domain experiments to simulate real-world scenarios involving domain shift and out-of-distribution data. Fine-tuned models perform well in-domain but struggle to generalize. Cross-domain architectures can reduce this gap but are data-hungry, while LLMs offer a promising alternative through zero- and few-shot learning. Given inherent dataset confounds and possible pre-training exposure, results should be interpreted as robustness evaluations within this English, text-only protocol.

URL PDF HTML ☆

赞 0 踩 0

2603.25499 2026-03-27 cs.CV cs.LG

Knowledge-Guided Failure Prediction: Detecting When Object Detectors Miss Safety-Critical Objects

Jakob Paul Zimmermann, Gerrit Holzbach, David Lerch

2603.25498 2026-03-27 cs.AI

EcoThink: A Green Adaptive Inference Framework for Sustainable and Accessible Agents

Linxiao Li, Zhixiang Lu

Comments Accepted by WWW 2026

2603.25495 2026-03-27 cs.LG cs.AI

Interpretable PM2.5 Forecasting for Urban Air Quality: A Comparative Study of Operational Time-Series Models

Moazzam Umer Gondal, Hamad ul Qudous, Asma Ahmad Farhan, Sultan Alamri

Comments Submitted to PLOS ONE

详情

英文摘要

Accurate short-term air-quality forecasting is essential for public health protection and urban management, yet many recent forecasting frameworks rely on complex, data-intensive, and computationally demanding models. This study investigates whether lightweight and interpretable forecasting approaches can provide competitive performance for hourly PM2.5 prediction in Beijing, China. Using multi-year pollutant and meteorological time-series data, we developed a leakage-aware forecasting workflow that combined chronological data partitioning, preprocessing, feature selection, and exogenous-driver modeling under the Perfect Prognosis setting. Three forecasting families were evaluated: SARIMAX, Facebook Prophet, and NeuralProphet. To assess practical deployment behavior, the models were tested under two adaptive regimes: weekly walk-forward refitting and frozen forecasting with online residual correction. Results showed clear differences in both predictive accuracy and computational efficiency. Under walk-forward refitting, Facebook Prophet achieved the strongest completed performance, with an MAE of $37.61$ and an RMSE of $50.10$, while also requiring substantially less execution time than NeuralProphet. In the frozen-model regime, online residual correction improved Facebook Prophet and SARIMAX, with corrected SARIMAX yielding the lowest overall error (MAE $32.50$; RMSE $46.85$). NeuralProphet remained less accurate and less stable across both regimes, and residual correction did not improve its forecasts. Notably, corrected Facebook Prophet reached nearly the same error as its walk-forward counterpart while reducing runtime from $15$ min $21.91$ sec to $46.60$ sec. These findings show that lightweight additive forecasting strategies can remain highly competitive for urban air-quality prediction, offering a practical balance between accuracy, interpretability, ...

URL PDF HTML ☆

赞 0 踩 0

2603.25494 2026-03-27 cs.CV

AdaSFormer: Adaptive Serialized Transformers for Monocular Semantic Scene Completion from Indoor Environments

Xuzhi Wang, Xinran Wu, Song Wang, Lingdong Kong, Ziping Zhao

Comments Accepted at CVPR 2026

2603.25489 2026-03-27 cs.CL

Translation Asymmetry in LLMs as a Data Augmentation Factor: A Case Study for 6 Romansh Language Varieties

Jannis Vamvas, Ignacio Pérez Prat, Angela Heldstab, Dominic P. Fischer, Sina Ahmadi, Rico Sennrich

Comments Preprint

2603.25481 2026-03-27 cs.RO

LILAC: Language-Conditioned Object-Centric Optical Flow for Open-Loop Trajectory Generation

Motonari Kambara, Koki Seno, Tomoya Kaichi, Yanan Wang, Komei Sugiura

Comments Accepted to IEEE RA-L