arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.07622 2026-04-10 cs.CL cs.AI cs.LG

DIVERSED: Relaxed Speculative Decoding via Dynamic Ensemble Verification

Ziyi Wang, Siva Rajesh Kasa, Ankith M S, Santhosh Kumar Kasa, Jiaru Zou, Sumit Negi, Ruqi Zhang, Nan Jiang, Qifan Song

Comments 35 pages, 9 figures, accepted at AISTATS 2026

2604.07615 2026-04-10 cs.CL

ADAG: Automatically Describing Attribution Graphs

Aryaman Arora, Zhengxuan Wu, Jacob Steinhardt, Sarah Schwettmann

2604.07612 2026-04-10 cs.SD cs.AI

Towards Real-Time Human-AI Musical Co-Performance: Accompaniment Generation with Latent Diffusion Models and MAX/MSP

Tornike Karchkhadze, Shlomo Dubnov

Comments 12 pages, 6 figures

2604.07610 2026-04-10 cs.LG cs.NE

Auto-Configured Networks for Multi-Scale Multi-Output Time-Series Forecasting

Yumeng Zha, Shengxiang Yang, Xianpeng Wang

2604.07607 2026-04-10 cs.RO cs.CV

EgoVerse: An Egocentric Human Dataset for Robot Learning from Around the World

Ryan Punamiya, Simar Kareer, Zeyi Liu, Josh Citron, Ri-Zhao Qiu, Xiongyi Cai, Alexey Gavryushin, Jiaqi Chen, Davide Liconti, Lawrence Y. Zhu, Patcharapong Aphiwetsa, Baoyu Li, Aniketh Cheluva, Pranav Kuppili, Yangcen Liu, Dhruv Patel, Aidan Gao, Hye-Young Chung, Ryan Co, Renee Zbizika, Jeff Liu, Xiaomeng Xu, Haoyu Xiong, Geng Chen, Sebastiano Oliani, Chenyu Yang, Xi Wang, James Fort, Richard Newcombe, Josh Gao, Jason Chong, Garrett Matsuda, Aseem Doriwala, Marc Pollefeys, Robert Katzschmann, Xiaolong Wang, Shuran Song, Judy Hoffman, Danfei Xu

2604.07606 2026-04-10 cs.CV

Bootstrapping Sign Language Annotations with Sign Language Models

Colin Lea, Vasileios Baltatzis, Connor Gillis, Raja Kushalnagar, Lorna Quandt, Leah Findlater

Comments Accepted to CVPR Findings 2026

2604.07603 2026-04-10 cs.LG

Implicit Regularization and Generalization in Overparameterized Neural Networks

Zeran Johannsen

Comments 12 pages, 5 figures

2604.07593 2026-04-10 cs.AI

Too long; didn't solve

Lucía M. Cabrera, Isaac Saxton-Knight

2604.07592 2026-04-10 cs.RO

Spatio-Temporal Grounding of Large Language Models from Perception Streams

Jacob Anderson, Bardh Hoxha, Georgios Fainekos, Hideki Okamoto, Danil Prokhorov

2604.07584 2026-04-10 cs.AI

From Papers to Property Tables: A Priority-Based LLM Workflow for Materials Data Extraction

Koushik Rameshbabu, Jing Luo, Ali Shargh, Khalid A. El-Awady, Jaafar A. El-Awady

详情

英文摘要

Scientific data are widely dispersed across research articles and are often reported inconsistently across text, tables, and figures, making manual data extraction and aggregation slow and error-prone. We present a prompt-driven, hierarchical workflow that uses a large language model (LLM) to automatically extract and reconstruct structured, shot-level shock-physics experimental records by integrating information distributed across text, tables, figures, and physics-based derivations from full-text published research articles, using alloy spall strength as a representative case study. The pipeline targeted 37 experimentally relevant fields per shot and applied a three-level priority strategy: (T1) direct extraction from text/tables, (T2) physics-based derivation using verified governing relations, and (T3) digitization from figures when necessary. Extracted values were normalized to canonical units, tagged by priority for traceability, and validated with physics-based consistency and plausibility checks. Evaluated on a benchmark of 30 published research articles comprising 11,967 evaluated data points, the workflow achieved high overall accuracy, with priority-wise accuracies of 94.93% (T1), 92.04% (T2), and 83.49% (T3), and an overall weighted accuracy of 94.69%. Cross-model testing further indicated strong agreement for text/table and equation-derived fields, with lower agreement for figure-based extraction. Implementation through an API interface demonstrated the scalability of the approach, achieving consistent extraction performance and, in a subset of test cases, matching or exceeding chat-based accuracy. This workflow demonstrates a practical approach for converting unstructured technical literature into traceable, analysis-ready datasets without task-specific fine-tuning, enabling scalable database construction in materials science.

URL PDF HTML ☆

赞 0 踩 0

2604.07578 2026-04-10 cs.CV

MSGL-Transformer: A Multi-Scale Global-Local Transformer for Rodent Social Behavior Recognition

Muhammad Imran Sharif, Doina Caragea

Comments 25 pages, 10 figures, submitted to Scientific Reports

2604.07577 2026-04-10 cs.CV

Event-Level Detection of Surgical Instrument Handovers in Videos with Interpretable Vision Models

Katerina Katsarou, George Zountsas, Karam Tomotaki-Dawoud, Alexander Ehrenhoefer, Paul Chojecki, David Przewozny, Igor Maximilian Sauer, Amira Mouakher, Sebastian Bosse

Comments 12 Pages, 6 figures, CVPR 2026 Workshop AI4RWC

2604.07575 2026-04-10 cs.RO

Robust Multi-Agent Target Tracking in Intermittent Communication Environments via Analytical Belief Merging

Mohamed Abdelnaby, Samuel Honor, Kevin Leahy

2604.07574 2026-04-10 cs.CV cs.NA math.NA

Mathematical Analysis of Image Matching Techniques

Oleh Samoilenko

Comments 16 pages, 5 figures, 1 table

2604.07569 2026-04-10 cs.LG cs.AI cs.CL cs.IT math.IT

Learning is Forgetting: LLM Training As Lossy Compression

Henry C. Conklin, Tom Hosking, Tan Yi-Chern, Julian Gold, Jonathan D. Cohen, Thomas L. Griffiths, Max Bartolo, Seraphina Goldfarb-Tarrant

Comments 12 page core paper, 16 page Appendix - A shorter version with fewer visuals appears at ICLR 2026

2604.07563 2026-04-10 cs.CV

On the Uphill Battle of Image frequency Analysis

Nader Bazyari, Hedieh Sajedi

Comments paper was accepted to IPCV 2021 track in CSCE 2021 cogress in a peer review process but was not published. https://www.american-cse.org/csce2021/publisher

2604.07559 2026-04-10 cs.AI

Dual-Loop Control in DCVerse: Advancing Reliable Deployment of AI in Data Centers via Digital Twins

Qingang Zhang, Yuejun Yan, Guangyu Wu, Siew-Chien Wong, Jimin Jia, Zhaoyang Wang, Yonggang Wen

2604.07557 2026-04-10 cs.LG q-bio.QM

Validated Synthetic Patient Generation for Small Longitudinal Cohorts: Coagulation Dynamics Across Pregnancy

Jeffrey D. Varner, Maria Cristina Bravo, Carole McBride, Thomas Orfeo, Ira Bernstein

2604.07553 2026-04-10 cs.CL cs.AI

TR-EduVSum: A Turkish-Focused Dataset and Consensus Framework for Educational Video Summarization

Figen Eğin, Aytuğ Onan

Comments 8 pages, 2 figures, 3 tables. Accepted at the Second Workshop on Natural Language Processing for Turkic Languages (SIGTURK 2026), EACL 2026, Rabat, Morocco

2604.07546 2026-04-10 cs.AI

Agentic Copyright, Data Scraping & AI Governance: Toward a Coasean Bargain in the Era of Artificial Intelligence

Paulius Jurcys, Mark Fenwick

2604.07535 2026-04-10 cs.AI

Trust the AI, Doubt Yourself: The Effect of Urgency on Self-Confidence in Human-AI Interaction

Baran Shajari, Xiaoran Liu, Kyanna Dagenais, Istvan David

2604.07525 2026-04-10 cs.LG cs.SY eess.SY

Learning Markov Processes as Sum-of-Square Forms for Analytical Belief Propagation

Peter Amorese, Morteza Lahijanian

Comments Twenty-Ninth Annual Conference on Artificial Intelligence and Statistics (AISTATS 2026)

2604.07518 2026-04-10 cs.CL

Decompose, Look, and Reason: Reinforced Latent Reasoning for VLMs

Mengdan Zhu, Senhao Cheng, Liang Zhao

2604.07517 2026-04-10 cs.RO

Grasp as You Dream: Imitating Functional Grasping from Generated Human Demonstrations

Chao Tang, Jiacheng Xu, Haofei Lu, Bolin Zou, Wenlong Dong, Hong Zhang, Danica Kragic

2604.07513 2026-04-10 cs.LG cs.AI cs.CL cs.CY

SYN-DIGITS: A Synthetic Control Framework for Calibrated Digital Twin Simulation

Grace Jiarui Fan, Chengpiao Huang, Tianyi Peng, Kaizheng Wang, Yuhang Wu

2604.07492 2026-04-10 cs.LG cs.AI

Cluster Attention for Graph Machine Learning

Oleg Platonov, Liudmila Prokhorenkova

2604.07490 2026-04-10 cs.CL cs.AI

Enabling Intrinsic Reasoning over Dense Geospatial Embeddings with DFR-Gemma

Xuechen Zhang, Aviv Slobodkin, Joydeep Paul, Mandar Sharma, Samet Oymak, Shravya Shetty, Gautam Prasad

2604.07487 2026-04-10 cs.AI

CLEAR: Context Augmentation from Contrastive Learning of Experience via Agentic Reflection

Linbo Liu, Guande Wu, Han Ding, Yawei Wang, Qiang Zhou, Yuzhe Lu, Zhichao Xu, Huan Song, Panpan Xu, Lin Lee Cheong

2604.07480 2026-04-10 cs.RO cs.AI cs.FL

Active Reward Machine Inference From Raw State Trajectories

Mohamad Louai Shehab, Antoine Aspeel, Necmiye Ozay

2604.07477 2026-04-10 cs.CV eess.IV

SMFD-UNet: Semantic Face Mask Is The Only Thing You Need To Deblur Faces

Abduz Zami

Comments BSc thesis

详情

英文摘要

For applications including facial identification, forensic analysis, photographic improvement, and medical imaging diagnostics, facial image deblurring is an essential chore in computer vision allowing the restoration of high-quality images from blurry inputs. Often based on general picture priors, traditional deblurring techniques find it difficult to capture the particular structural and identity-specific features of human faces. We present SMFD-UNet (Semantic Mask Fusion Deblurring UNet), a new lightweight framework using semantic face masks to drive the deblurring process, therefore removing the need for high-quality reference photos in order to solve these difficulties. First, our dual-step method uses a UNet-based semantic mask generator to directly extract detailed facial component masks (e.g., eyes, nose, mouth) straight from blurry photos. Sharp, high-fidelity facial images are subsequently produced by integrating these masks with the blurry input using a multi-stage feature fusion technique within a computationally efficient UNet framework. We created a randomized blurring pipeline that roughly replicates real-world situations by simulating around 1.74 trillion deterioration scenarios, hence guaranteeing resilience. Examined on the CelebA dataset, SMFD-UNet shows better performance than state-of-the-art models, attaining higher Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM) while preserving satisfactory naturalness measures, including NIQE, LPIPS, and FID. Powered by Residual Dense Convolution Blocks (RDC), a multi-stage feature fusion strategy, efficient and effective upsampling techniques, attention techniques like CBAM, post-processing techniques, and the lightweight design guarantees scalability and efficiency, enabling SMFD-UNet to be a flexible solution for developing facial image restoration research and useful applications.

URL PDF HTML ☆

赞 0 踩 0