arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.20310 2026-03-24 cs.CV cs.GR

GraphiContact: Pose-aware Human-Scene Robust Contact Perception for Interactive Systems

Xiaojian Lin, Yaomin Shen, Junyuan Ma, Yujie Sun, Chengqing Bu, Wenxin Zhang, Zongzheng Zhang, Hao Fei, Lei Jin, Hao Zhao

Comments 15 pages, 9 figures, Accepted at ICME 2026

详情

英文摘要

Monocular vertex-level human-scene contact prediction is a fundamental capability for interactive systems such as assistive monitoring, embodied AI, and rehabilitation analysis. In this work, we study this task jointly with single-image 3D human mesh reconstruction, using reconstructed body geometry as a scaffold for contact reasoning. Existing approaches either focus on contact prediction without sufficiently exploiting explicit 3D human priors, or emphasize pose/mesh reconstruction without directly optimizing robust vertex-level contact inference under occlusion and perceptual noise. To address this gap, we propose GraphiContact, a pose-aware framework that transfers complementary human priors from two pretrained Transformer encoders and predicts per-vertex human-scene contact on the reconstructed mesh. To improve robustness in real-world scenarios, we further introduce a Single-Image Multi-Infer Uncertainty (SIMU) training strategy with token-level adaptive routing, which simulates occlusion and noisy observations during training while preserving efficient single-branch inference at test time. Experiments on five benchmark datasets show that GraphiContact achieves consistent gains on both contact prediction and 3D human reconstruction. Our code, based on the GraphiContact method, provides comprehensive 3D human reconstruction and interaction analysis, and will be publicly available at https://github.com/Aveiro-Lin/GraphiContact.

URL PDF HTML ☆

赞 0 踩 0

2603.20307 2026-03-24 cs.CV cs.AI cs.MM cs.SD

EARTalking: End-to-end GPT-style Autoregressive Talking Head Synthesis with Frame-wise Control

Yuzhe Weng, Haotian Wang, Yuanhong Yu, Jun Du, Shan He, Xiaoyan Wu, Haoran Xu

2603.20305 2026-03-24 cs.CV

The Global-Local loop: what is missing in bridging the gap between geospatial data from numerous communities?

Clément Mallet, Ana-Maria Raimond

Comments Accepted at the 2026 ISPRS Congress

2603.20303 2026-03-24 cs.CV cs.AI

InjectFlow: Weak Guides Strong via Orthogonal Injection for Flow Matching

Dayu Wang, Jiaye Yang, Weikang Li, Jiahui Liang, Yang Li

2603.20297 2026-03-24 cs.LG cs.AI

Transformer-Based Predictive Maintenance for Risk-Aware Instrument Calibration

Adithya Parthasarathy, Aswathnarayan Muthukrishnan Kirubakaran, Akshay Deshpande, Ram Sekhar Bodala, Suhas Malempati, Nachiappan Chockalingam, Vinoth Punniyamoorthy, Seema Gangaiah Aarella

2603.20296 2026-03-24 cs.LG cs.AI

Collaborative Adaptive Curriculum for Progressive Knowledge Distillation

Jing Liu, Zhenchao Ma, Han Yu, Bobo Ju, Wenliang Yang, Chengfang Li, Bo Hu, Liang Song

Comments Accepted by IEEE ICME 2026

2603.20295 2026-03-24 cs.LG cs.AI

MARLIN: Multi-Agent Reinforcement Learning for Incremental DAG Discovery

Dong Li, Zhengzhang Chen, Xujiang Zhao, Linlin Yu, Zhong Chen, Yi He, Haifeng Chen, Chen Zhao

Comments AAAI 2026

2603.20293 2026-03-24 cs.AI

LLM-Enhanced Energy Contrastive Learning for Out-of-Distribution Detection in Text-Attributed Graphs

Xiaoxu Ma, Dong Li, Minglai Shao, Xintao Wu, Chen Zhao

Comments AAAI 2026

2603.20292 2026-03-24 cs.CV cs.AI cs.LG

HSI Image Enhancement Classification Based on Knowledge Distillation: A Study on Forgetting

Songfeng Zhu

Comments 18pages,7figures

2603.20290 2026-03-24 cs.CV cs.RO eess.IV

Transparent Fragments Contour Estimation via Visual-Tactile Fusion for Autonomous Reassembly

Qihao Lin, Borui Chen, Yuping Zhou, Jianing Wu, Yulan Guo, Weishi Zheng, Chongkun Xia

Comments 17 pages, 22 figures, submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence

详情

英文摘要

The contour estimation of transparent fragments is very important for autonomous reassembly, especially in the fields of precision optical instrument repair, cultural relic restoration, and identification of other precious device broken accidents. Different from general intact transparent objects, the contour estimation of transparent fragments face greater challenges due to strict optical properties, irregular shapes and edges. To address this issue, a general transparent fragments contour estimation framework based on visual-tactile fusion is proposed in this paper. First, we construct the transparent fragment dataset named TransFrag27K, which includes a multiscene synthetic data of broken fragments from multiple types of transparent objects, and a scalable synthetic data generation pipeline. Secondly, we propose a visual grasping position detection network named TransFragNet to identify, locate and segment the sampling grasping position. And, we use a two-finger gripper with Gelsight Mini sensors to obtain reconstructed tactile information of the lateral edge of the fragments. By fusing this tactile information with visual cues, a visual-tactile fusion material classifier is proposed. Inspired by the way humans estimate a fragment's contour combining vision and touch, we introduce a general transparent fragment contour estimation framework based on visual-tactile fusion, demonstrates strong performance in real-world validation. Finally, a multi-dimensional similarity metrics based contour matching and reassembly algorithm is proposed, providing a reproducible benchmark for evaluating visual-tactile contour estimation and fragment reassembly. The experimental results demonstrate the validity of the proposed framework. The dataset and codes are available at https://github.com/Keithllin/Transparent-Fragments-Contour-Estimation.

URL PDF HTML ☆

赞 0 踩 0

2603.20289 2026-03-24 cs.CV

Remote Sensing Image Dehazing: A Systematic Review of Progress, Challenges, and Prospects

Heng Zhou, Xiaoxiong Liu, Zhenxi Zhang, Jieheng Yun, Chengyang Li, Yunchu Yang, Dongyi Xia, Chunna Tian, Xiao-Jun Wu

Comments 82 pages, 23 figures,

详情

DOI: 10.1016/j.isprsjprs.2026.03.008
Journal ref: ISPRS P&RS 2026

英文摘要

Remote sensing images (RSIs) are frequently degraded by haze, fog, and thin clouds, which obscure surface reflectance and hinder downstream applications. This study presents the first systematic and unified survey of RSIs dehazing, integrating methodological evolution, benchmark assessment, and physical consistency analysis. We categorize existing approaches into a three-stage progression: from handcrafted physical priors, to data-driven deep restoration, and finally to hybrid physical-intelligent generation, and summarize more than 30 representative methods across CNNs, GANs, Transformers, and diffusion models. To provide a reliable empirical reference, we conduct large-scale quantitative experiments on five public datasets using 12 metrics, including PSNR, SSIM, CIEDE, LPIPS, FID, SAM, ERGAS, UIQI, QNR, NIQE, and HIST. Cross-domain comparison reveals that recent Transformer- and diffusion-based models improve SSIM by 12%~18% and reduce perceptual errors by 20%~35% on average, while hybrid physics-guided designs achieve higher radiometric stability. A dedicated physical radiometric consistency experiment further demonstrates that models with explicit transmission or airlight constraints reduce color bias by up to 27%. Based on these findings, we summarize open challenges: dynamic atmospheric modeling, multimodal fusion, lightweight deployment, data scarcity, and joint degradations, and outline promising research directions for future development of trustworthy, controllable, and efficient (TCE) dehazing systems. All reviewed resources, including source code, benchmark datasets, evaluation metrics, and reproduction configurations are publicly available at https://github.com/VisionVerse/RemoteSensing-Restoration-Survey.

URL PDF HTML ☆

赞 0 踩 0

2603.20288 2026-03-24 cs.CV

Efficient Visual Anomaly Detection at the Edge: Enabling Real-Time Industrial Inspection on Resource-Constrained Devices

Arianna Stropeni, Fabrizio Genilotti, Francesco Borsatti, Manuel Barusco, Davide Dalle Pezze, Gian Antonio Susto

2603.20285 2026-03-24 cs.AI

AgentComm-Bench: Stress-Testing Cooperative Embodied AI Under Latency, Packet Loss, and Bandwidth Collapse

Aayam Bansal, Ishaan Gangwani

2603.20280 2026-03-24 cs.CV cs.AR cs.LG

Mix-and-Match Pruning: Globally Guided Layer-Wise Sparsification of DNNs

Danial Monachan, Samira Nazari, Mahdi Taheri, Ali Azarpeyvand, Milos Krstic, Michael Huebner, Christian Herglotz

2603.20276 2026-03-24 cs.AI

Me, Myself, and $π$ : Evaluating and Explaining LLM Introspection

Atharv Naphade, Samarth Bhargav, Sean Lim, Mcnair Shah

Comments 20 pages, 12 figures, ICLR 2026 Workshop: From Human Cognition to AI Reasoning: Models, Methods, and Applications

2603.20275 2026-03-24 cs.CV cs.AI

Understanding Pruning Regimes in Vision-Language Models Through Domain-Aware Layer Selection

Saeed Khaki, Nima Safaei, Kamal Ginotra

2603.20273 2026-03-24 cs.CV cs.AI

Efficient AI-Driven Multi-Section Whole Slide Image Analysis for Biochemical Recurrence Prediction in Prostate Cancer

Yesung Cho, Dongmyung Shin, Sujeong Hong, Jooyeon Lee, Seongmin Park, Geongyu Lee, Jongbae Park, Hong Koo Ha

2603.20270 2026-03-24 cs.AI

FactorSmith: Agentic Simulation Generation via Markov Decision Process Decomposition with Planner-Designer-Critic Refinement

Ali Shamsaddinlou, Morteza NourelahiAlamdari

2603.20267 2026-03-24 cs.AI

Domain-Specialized Tree of Thought through Plug-and-Play Predictors

Xuanqi Gao, Haoyu Wang, Jun Sun, Shiqing Ma, Chao Shen

2603.20260 2026-03-24 cs.AI

ProMAS: Proactive Error Forecasting for Multi-Agent Systems Using Markov Transition Dynamics

Xinkui Zhao, Sai Liu, Yifan Zhang, Qingyu Ma, Guanjie Cheng, Naibo Wang, Chang Liu

2603.20256 2026-03-24 cs.CL cs.AI cs.CE cs.LG cs.MA cs.SY eess.SY

SciNav: A General Agent Framework for Scientific Coding Tasks

Tianshu Zhang, Huan Sun

Comments Accepted by ICLR 2026

详情

英文摘要

Autonomous science agents built on large language models (LLMs) are increasingly used to generate hypotheses, design experiments, and produce reports. However, prior work mainly targets open-ended scientific problems with subjective outputs that are difficult to evaluate. Scientific coding benchmarks, by contrast, provide executable outputs for objective assessment. Existing approaches remain engineering-driven pipelines, revealing the need for structured, end-to-end science agent frameworks for scientific coding tasks. We address this gap by focusing on scientific coding tasks, where evaluation can be made rigorously, and introducing an agent framework SciNav (Scientific Navigator) that enables more effective solution exploration. Our framework is designed to operate under constrained search budgets, moving beyond reliance on pre-defined success metrics and prolonged search cycles. Inspired by findings that comparative judgments often reveal finer-grained quality differences and therefore provide greater discriminative power than absolute scoring, our framework leverages pairwise relative judgments within a tree search process to select top-K promising solution branches, prune low-potential ones, and progressively narrow down the solution candidates on the selected branches guided by relative comparisons. We demonstrate our agent's effectiveness across different types of tasks on two benchmarks. Experiments show that SciNav significantly outperforms direct prompting and prior agents like OpenHands and Self-Debug across different base models, task types, and difficulty levels, and exceeds different frontier comparators such as random selection and LLM absolute scoring. These results confirm the strength of our agent design and highlight the effectiveness of relative judgment-guided top-K search for high-quality scientific coding, marking a step toward more practical science agents.

URL PDF HTML ☆

赞 0 踩 0

2603.20255 2026-03-24 cs.CL cs.HC cs.LG cs.SD eess.AS

Abjad-Kids: An Arabic Speech Classification Dataset for Primary Education

Abdul Aziz Snoubara, Baraa Al_Maradni, Haya Al_Naal, Malek Al_Madrmani, Roaa Jdini, Seedra Zarzour, Khloud Al Jallad

详情

英文摘要

Speech-based AI educational applications have gained significant interest in recent years, particularly for children. However, children speech research remains limited due to the lack of publicly available datasets, especially for low-resource languages such as Arabic.This paper presents Abjad-Kids, an Arabic speech dataset designed for kindergarten and primary education, focusing on fundamental learning of alphabets, numbers, and colors. The dataset consists of 46397 audio samples collected from children aged 3 - 12 years, covering 141 classes. All samples were recorded under controlled specifications to ensure consistency in duration, sampling rate, and format. To address high intra-class similarity among Arabic phonemes and the limited samples per class, we propose a hierarchical audio classification based on CNN-LSTM architectures. Our proposed methodology decomposes alphabet recognition into a two-stage process: an initial grouping classification model followed by specialized classifiers for each group. Both strategies: static linguistic-based grouping and dynamic clustering-based grouping, were evaluated. Experimental results demonstrate that static linguistic-based grouping achieves superior performance. Comparisons between traditional machine learning with deep learning approaches, highlight the effectiveness of CNN-LSTM models combined with data augmentation. Despite achieving promising results, most of our experiments indicate a challenge with overfitting, which is likely due to the limited number of samples, even after data augmentation and model regularization. Thus, future work may focus on collecting additional data to address this issue. Abjad-Kids will be publicly available. We hope that Abjad-Kids enrich children representation in speech dataset, and be a good resource for future research in Arabic speech classification for kids.

URL PDF HTML ☆

赞 0 踩 0

2603.20252 2026-03-24 cs.CL q-fin.CP

FinReflectKG -- HalluBench: GraphRAG Hallucination Benchmark for Financial Question Answering Systems

Mahesh Kumar, Bhaskarjit Sarmah, Stefano Pasquali

详情

英文摘要

As organizations increasingly integrate AI-powered question-answering systems into financial information systems for compliance, risk assessment, and decision support, ensuring the factual accuracy of AI-generated outputs becomes a critical engineering challenge. Current Knowledge Graph (KG)-augmented QA systems lack systematic mechanisms to detect hallucinations - factually incorrect outputs that undermine reliability and user trust. We introduce FinBench-QA-Hallucination, a benchmark for evaluating hallucination detection methods in KG-augmented financial QA over SEC 10-K filings. The dataset contains 755 annotated examples from 300 pages, each labeled for groundedness using a conservative evidence-linkage protocol requiring support from both textual chunks and extracted relational triplets. We evaluate six detection approaches - LLM judges, fine-tuned classifiers, Natural Language Inference (NLI) models, span detectors, and embedding-based methods under two conditions: with and without KG triplets. Results show that LLM-based judges and embedding approaches achieve the highest performance (F1: 0.82-0.86) under clean conditions. However, most methods degrade significantly when noisy triplets are introduced, with Matthews Correlation Coefficient (MCC) dropping 44-84 percent, while embedding methods remain relatively robust with only 9 percent degradation. Statistical tests (Cochran's Q and McNemar) confirm significant performance differences (p < 0.001). Our findings highlight vulnerabilities in current KG-augmented systems and provide insights for building reliable financial information systems, where hallucinations can lead to regulatory violations and flawed decisions. The benchmark also offers a framework for integrating AI reliability evaluation into information system design across other high-stakes domains such as healthcare, legal, and government.

URL PDF HTML ☆

赞 0 踩 0

2603.20246 2026-03-24 cs.CL cs.AI cs.NE q-bio.NC

Decoding the decoder: Contextual sequence-to-sequence modeling for intracortical speech decoding

Michal Olak, Tommaso Boccato, Matteo Ferrante

2603.20242 2026-03-24 cs.SD eess.AS

LL-SDR: Low-Latency Speech enhancement through Discrete Representations

Jingyi Li, Luca Della Libera, Mirco Ravanelli, Cem Subakan

Comments 5 pages, 1 figure

2603.20239 2026-03-24 cs.RO cs.CV

Rheos: Modelling Continuous Motion Dynamics in Hierarchical 3D Scene Graphs

Iacopo Catalano, Francesco Verdoja, Javier Civera, Jorge Peña-Queralta, Julio A. Placed

2603.20236 2026-03-24 cs.RO

EnergyAction: Unimanual to Bimanual Composition with Energy-Based Models

Mingchen Song, Xiang Deng, Jie Wei, Dongmei Jiang, Liqiang Nie, Weili Guan

2603.20234 2026-03-24 cs.RO cs.AI

Emergency Lane-Change Simulation: A Behavioral Guidance Approach for Risky Scenario Generation

Chen Xiong, Cheng Wang, Yuhang Liu, Zirui Wu, Ye Tian

2603.20233 2026-03-24 cs.RO

SwiftBot: A Decentralized Platform for LLM-Powered Federated Robotic Task Execution

YueMing Zhang, Shuai Xu, Zhengxiong Li, Fangtian Zhong, Xiaokun Yang, Hailu Xu

Comments This paper has been accepted by IEEE CCGrid 2026. We upload to arXiv for pre-print

2603.20232 2026-03-24 cs.RO cs.AI cs.LG

Fusing Driver Perceived and Physical Risk for Safety Critical Scenario Screening in Autonomous Driving

Chen Xiong, Ziwen Wang, Deqi Wang, Cheng Wang, Yiyang Chen, He Zhang, Chao Gou