arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.07935 2026-03-10 cs.SD

Unsupervised Domain Adaptation for Audio Deepfake Detection with Modular Statistical Transformations

Urawee Thani, Gagandeep Singh, Priyanka Singh

Comments 9 pages, 4 figures

详情

英文摘要

Audio deepfake detection systems trained on one dataset often fail when deployed on data from different sources due to distributional shifts in recording conditions, synthesis methods, and acoustic environments. We present a modular pipeline for unsupervised domain adaptation that combines pre-trained Wav2Vec 2.0 embeddings with statistical transformations to improve cross-domain generalization without requiring labeled target data. Our approach applies power transformation for feature normalization, ANOVA-based feature selection, joint PCA for domain-agnostic dimensionality reduction, and CORAL alignment to match source and target covariance structures before classification via logistic regression. We evaluate on two cross-domain transfer scenarios: ASVspoof 2019 LA to Fake-or-Real (FoR) and FoR to ASVspoof, achieving 62.7--63.6\% accuracy with balanced performance across real and fake classes. Systematic ablation experiments reveal that feature selection (+3.5%) and CORAL alignment (+3.2%) provide the largest individual contributions, with the complete pipeline improving accuracy by 10.7% over baseline. While performance is modest compared to within-domain detection (94-96%), our pipeline offers transparency and modularity, making it suitable for deployment scenarios requiring interpretable decisions.

URL PDF HTML ☆

赞 0 踩 0

2603.07931 2026-03-10 cs.CL

BRIDGE: Benchmark for multi-hop Reasoning In long multimodal Documents with Grounded Evidence

Biao Xiang, Soyeon Caren Han, Yihao Ding

2603.07929 2026-03-10 cs.CV

A Hybrid Vision Transformer Approach for Mathematical Expression Recognition

Anh Duy Le, Van Linh Pham, Vinh Loi Ly, Nam Quan Nguyen, Huu Thang Nguyen, Tuan Anh Tran

Comments Accepted as oral presentation at DICTA 2022

2603.07928 2026-03-10 cs.RO

Omnidirectional Humanoid Locomotion on Stairs via Unsafe Stepping Penalty and Sparse LiDAR Elevation Mapping

Yuzhi Jiang, Yujun Liang, Junhao Li, Han Ding, Lijun Zhu

2603.07924 2026-03-10 cs.LG cs.CY

Semantic Risk Scoring of Aggregated Metrics: An AI-Driven Approach for Healthcare Data Governance

Mohammed Omer Shakeel Ahmed

Comments 6 pages, 3 figures, 1 Table, Accepted for publication in the 21st Int. Conference on Data Science (ICDATA 25)

详情

英文摘要

Large healthcare institutions typically operate multiple business intelligence (BI) teams segmented by domain, including clinical performance, fundraising, operations, and compliance. Due to HIPAA, FERPA, and IRB restrictions, these teams face challenges in sharing patient-level data needed for analytics. To mitigate this, A metric aggregation table is proposed, which is a precomputed, privacy-compliant summary. These abstractions enable decision-making without direct access to sensitive data. However, even aggregated metrics can inadvertently lead to privacy risks if constructed without rigorous safeguards. A modular AI framework is proposed that evaluates SQL-based metric definitions for potential overexposure using both semantic and syntactic features. Specifically, the system parses SQL queries into abstract syntax trees (ASTs), extracts sensitive patterns (e.g., fine-grained GROUP BY on ZIP code or gender), and encodes the logic using pretrained CodeBERT embeddings. These are fused with structural features and passed to an XGBoost classifier trained to assign risk scores. Queries that surpass the risk threshold (e.g., > 0.85) are flagged and returned with human-readable explanations. This enables proactive governance, preventing statistical disclosure before deployment. This implementation demonstrates strong potential for cross-departmental metric sharing in healthcare while maintaining compliance and auditability. The system also promotes role-based access control (RBAC), supports zero-trust data architectures, and aligns with national data modernization goals by ensuring that metric pipelines are explainable, privacy-preserving, and AI-auditable by design. Unlike prior works that rely on runtime data access to flag privacy violations, the proposed framework performs static, explainable detection at the query-level, enabling pre-execution protection and audit readiness

URL PDF HTML ☆

赞 0 踩 0

2603.07918 2026-03-10 cs.CV

Enhancing Unregistered Hyperspectral Image Super-Resolution via Unmixing-based Abundance Fusion Learning

Yingkai Zhang, Tao Zhang, Jing Nie, Ying Fu

2603.07915 2026-03-10 cs.AI

Ares: Adaptive Reasoning Effort Selection for Efficient LLM Agents

Jingbo Yang, Bairu Hou, Wei Wei, Yujia Bao, Shiyu Chang

2603.07912 2026-03-10 cs.CV

Geometric Transformation-Embedded Mamba for Learned Video Compression

Hao Wei, Yanhui Zhou, Chenyang Ge

2603.07911 2026-03-10 cs.CV

Beyond Heuristic Prompting: A Concept-Guided Bayesian Framework for Zero-Shot Image Recognition

Hui Liu, Kecheng Chen, Jialiang Wang, Xianming Liu, Wenya Wang, Haoliang Li

Comments 19 pages, Accepted by CVPR 2026

2603.07909 2026-03-10 cs.RO cs.AI

Long-Short Term Agents for Pure-Vision Bronchoscopy Robotic Autonomy

Junyang Wu, Mingyi Luo, Fangfang Xie, Minghui Zhang, Hanxiao Zhang, Chunxi Zhang, Junhao Wang, Jiayuan Sun, Yun Gu, Guang-Zhong Yang

2603.07901 2026-03-10 cs.RO cs.LG

NaviDriveVLM: Decoupling High-Level Reasoning and Motion Planning for Autonomous Driving

Ximeng Tao, Pardis Taghavi, Dimitar Filev, Reza Langari, Gaurav Pandey

2603.07899 2026-03-10 cs.LG stat.ML

Bayesian Transformer for Probabilistic Load Forecasting in Smart Grids

Sajib Debnath, Md. Uzzal Mia

详情

英文摘要

The reliable operation of modern power grids requires probabilistic load forecasts with well-calibrated uncertainty estimates. However, existing deep learning models produce overconfident point predictions that fail catastrophically under extreme weather distributional shifts. This study proposes a Bayesian Transformer (BT) framework that integrates three complementary uncertainty mechanisms into a PatchTST backbone: Monte Carlo Dropout for epistemic parameter uncertainty, variational feed-forward layers with log-uniform weight priors, and stochastic attention with learnable Gaussian noise perturbations on pre-softmax logits, representing, to the best of our knowledge, the first application of Bayesian attention to probabilistic load forecasting. A seven-level multi-quantile pinball-loss prediction head and post-training isotonic regression calibration produce sharp, near-nominally covered prediction intervals. Evaluation of five grid datasets (PJM, ERCOT, ENTSO-E Germany, France, and Great Britain) augmented with NOAA covariates across 24, 48, and 168-hour horizons demonstrates state-of-the-art performance. On the primary benchmark (PJM, H=24h), BT achieves a CRPS of 0.0289, improving 7.4% over Deep Ensembles and 29.9% over the deterministic LSTM, with 90.4% PICP at the 90% nominal level and the narrowest prediction intervals (4,960 MW) among all probabilistic baselines. During heat-wave and cold snap events, BT maintained 89.6% and 90.1% PICP respectively, versus 64.7% and 67.2% for the deterministic LSTM, confirming that Bayesian epistemic uncertainty naturally widens intervals for out-of-distribution inputs. Calibration remained stable across all horizons (89.8-90.4% PICP), while ablation confirmed that each component contributed a distinct value. The calibrated outputs directly support risk-based reserve sizing, stochastic unit commitment, and demand response activation.

URL PDF HTML ☆

赞 0 踩 0

2603.07898 2026-03-10 cs.CV cs.LG

Revisiting Unknowns: Towards Effective and Efficient Open-Set Active Learning

Chen-Chen Zong, Yu-Qi Chi, Xie-Yang Wang, Yan Cui, Sheng-Jun Huang

Comments Accepted to CVPR 2026

2603.07897 2026-03-10 cs.LG

LeJOT-AutoML: LLM-Driven Feature Engineering for Job Execution Time Prediction in Databricks Cost Optimization

Lizhi Ma, Yi-Xiang Hu, Yihui Ren, Feng Wu, Xiang-Yang Li

2603.07896 2026-03-10 cs.AI cs.LG

SMGI: A Structural Theory of General Artificial Intelligence

Aomar Osmani

Comments Preprint. 77 pages, 1 figure, 3 tables

2603.07895 2026-03-10 cs.CV

MINT: Molecularly Informed Training with Spatial Transcriptomics Supervision for Pathology Foundation Models

Minsoo Lee, Jonghyun Kim, Juseung Yun, Sunwoo Yu, Jongseong Jang

2603.07891 2026-03-10 cs.AI

A Lightweight Traffic Map for Efficient Anytime LaCAM*

Bojie Shen, Yue Zhang, Zhe Chen, Daniel Harabor

2603.07890 2026-03-10 cs.AI cs.CV

Visualizing Coalition Formation: From Hedonic Games to Image Segmentation

Pedro Henrique de Paula França, Lucas Lopes Felipe, Daniel Sadoc Menasché

Comments The First Workshop on AI for Mechanism Design and Strategic Decision Making -- Workshop AIMS at ICLR 2026

2603.07889 2026-03-10 cs.CV

Structure and Progress Aware Diffusion for Medical Image Segmentation

Siyuan Song, Guyue Hu, Chenglong Li, Dengdi Sun, Zhe Jin, Jin Tang

2603.07888 2026-03-10 cs.CV cs.AI cs.LG

VLM-SubtleBench: How Far Are VLMs from Human-Level Subtle Comparative Reasoning?

Minkyu Kim, Sangheon Lee, Dongmin Park

Comments ICLR 2026

2603.07887 2026-03-10 cs.LG cs.AI cs.CL math.ST stat.ML stat.TH

Reject, Resample, Repeat: Understanding Parallel Reasoning in Language Model Inference

Noah Golowich, Fan Chen, Dhruv Rohatgi, Raghav Singhal, Carles Domingo-Enrich, Dylan J. Foster, Akshay Krishnamurthy

2603.07886 2026-03-10 cs.CL cs.AI

CCR-Bench: A Comprehensive Benchmark for Evaluating LLMs on Complex Constraints, Control Flows, and Real-World Cases

Xiaona Xue, Yiqiao Huang, Jiacheng Li, Yuanhang Zheng, Huiqi Miao, Yunfei Ma, Rui Liu, Xinbao Sun, Minglu Liu, Fanyu Meng, Chao Deng, Junlan Feng

2603.07885 2026-03-10 cs.RO

Identifying Influential Actions in Human-Robot Interactions

Haoyang Jiang, Chenfei Xu, Yuya Okadome, Yukata Nakamura

Comments Presented at the 30th International Symposium on Artificial Life and Robotics (AROB 30th). Beppu, Japan, January 2025

2603.07875 2026-03-10 cs.RO

Choose What to Observe: Task-Aware Semantic-Geometric Representations for Visuomotor Policy

Haoran Ding, Liang Ma, Yaxun Yang, Wen Yang, Tianyu Liu, Anqing Duan, Xiaodan Liang, Dezhen Song, Ivan Laptev, Yoshihiko Nakamura

2603.07874 2026-03-10 cs.CV cs.LG

Toward Unified Multimodal Representation Learning for Autonomous Driving

Ximeng Tao, Dimitar Filev, Gaurav Pandey

2603.07868 2026-03-10 cs.AI cs.LG

Hospitality-VQA: Decision-Oriented Informativeness Evaluation for Vision-Language Models

Jeongwoo Lee, Baek Duhyeong, Eungyeol Han, Soyeon Shin, Gukin han, Seungduk Kim, Jaehyun Jeon, Taewoo Jeong

Comments Accepted at EACL 2026 SRW. 16 pages

2603.07867 2026-03-10 cs.LG cs.AI

Slumbering to Precision: Enhancing Artificial Neural Network Calibration Through Sleep-like Processes

Jean Erik Delanois, Aditya Ahuja, Giri P. Krishnan, Maxim Bazhenov

2603.07866 2026-03-10 cs.RO cs.LG cs.SY eess.SY

Viewpoint-Agnostic Grasp Pipeline using VLM and Partial Observations

Dilermando Almeida, Juliano Negri, Guilherme Lazzarini, Thiago H. Segreto, Ranulfo Bezerra, Ricardo V. Godoy, Marcelo Becker

2603.07865 2026-03-10 cs.SD cs.CV eess.AS

SoundWeaver: Semantic Warm-Starting for Text-to-Audio Diffusion Serving

Ayush Barik, Sofia Stoica, Nikhil Sarda, Arnav Kethana, Abhinav Khanduja, Muchen Xu, Fan Lai

Comments Submitted to INTERSPEECH 2026

2603.07853 2026-03-10 cs.AI cs.CL cs.IR

SynPlanResearch-R1: Encouraging Tool Exploration for Deep Research with Synthetic Plans

Hansi Zeng, Zoey Li, Yifan Gao, Chenwei Zhang, Xiaoman Pan, Tao Yang, Fengran Mo, Jiacheng Lin, Xian Li, Jingbo Shang