arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

检索范围排序方式

检索时间范围

重置

HOT 人工智能、机器人等 9

cs.AI 人工智能 cs.CV 计算机视觉 cs.CL 自然语言处理 cs.RO 机器人 cs.LG 机器学习 cs.SD 声音 cs.ET 新兴技术 eess.AS 音频语音 eess.IV 图像视频

CS 计算机 41

cs 计算机 cs.AI 人工智能 cs.AR 硬件架构 cs.CC 计算复杂性 cs.CE 计算工程 cs.CG 计算几何 cs.CL 自然语言处理 cs.CR 密码安全 cs.CV 计算机视觉 cs.CY 计算机与社会 cs.DB 数据库 cs.DC 分布式计算 cs.DL 数字图书馆 cs.DM 离散数学 cs.DS 数据结构 cs.ET 新兴技术 cs.FL 形式语言 cs.GL 综述文献 cs.GR 图形学 cs.GT 博弈论 cs.HC 人机交互 cs.IR 信息检索 cs.IT 信息论 cs.LG 机器学习 cs.LO 计算机逻辑 cs.MA 多智能体 cs.MM 多媒体 cs.MS 数学软件 cs.NA 数值分析 cs.NE 神经进化 cs.NI 网络架构 cs.OH 其他计算机 cs.OS 操作系统 cs.PF 性能 cs.PL 编程语言 cs.RO 机器人 cs.SC 符号计算 cs.SD 声音 cs.SE 软件工程 cs.SI 社会信息网络 cs.SY 系统控制

ECON 经济学 4

econ 经济学 econ.EM 计量经济 econ.GN 一般经济 econ.TH 理论经济

EESS 电气与系统 5

eess 电气与系统 eess.AS 音频语音 eess.IV 图像视频 eess.SP 信号处理 eess.SY 系统控制

MATH 数学 33

math 数学 math.AC 交换代数 math.AG 代数几何 math.AP 偏微分方程 math.AT 代数拓扑 math.CA 经典分析 math.CO 组合数学 math.CT 范畴论 math.CV 复变函数 math.DG 微分几何 math.DS 动力系统 math.FA 泛函分析 math.GM 一般数学 math.GN 一般拓扑 math.GR 群论 math.GT 几何拓扑 math.HO 历史综述 math.IT 信息论 math.KT K理论 math.LO 逻辑 math.MG 度量几何 math.MP 数学物理 math.NA 数值分析 math.NT 数论 math.OA 算子代数 math.OC 优化控制 math.PR 概率 math.QA 量子代数 math.RA 环与代数 math.RT 表示论 math.SG 辛几何 math.SP 谱理论 math.ST 统计理论

PHYSICS 物理 55

astro-ph 天体物理 astro-ph.CO 宇宙学 astro-ph.EP 地球行星 astro-ph.GA 星系物理 astro-ph.HE 高能天体 astro-ph.IM 天文仪器 astro-ph.SR 太阳恒星 cond-mat 凝聚态 cond-mat.dis-nn 无序神经 cond-mat.mes-hall 介观纳米 cond-mat.mtrl-sci 材料科学 cond-mat.other 其他凝聚态 cond-mat.quant-gas 量子气体 cond-mat.soft 软凝聚态 cond-mat.stat-mech 统计力学 cond-mat.str-el 强关联电子 cond-mat.supr-con 超导 gr-qc 广义相对论 hep-ex 高能实验 hep-lat 格点高能 hep-ph 高能唯象 hep-th 高能理论 math-ph 数学物理 nlin 非线性科学 nlin.AO 自适应系统 nlin.CD 混沌动力学 nlin.CG 胞自动机 nlin.PS 斑图孤子 nlin.SI 可积系统 nucl-ex 核物理实验 nucl-th 核物理理论 physics 物理 physics.acc-ph 加速器物理 physics.ao-ph 大气海洋 physics.app-ph 应用物理 physics.atm-clus 原子分子团簇 physics.atom-ph 原子物理 physics.bio-ph 生物物理 physics.chem-ph 化学物理 physics.class-ph 经典物理 physics.comp-ph 计算物理 physics.data-an 数据分析 physics.ed-ph 物理教育 physics.flu-dyn 流体动力学 physics.gen-ph 普通物理 physics.geo-ph 地球物理 physics.hist-ph 物理史哲 physics.ins-det 仪器探测 physics.med-ph 医学物理 physics.optics 光学 physics.plasm-ph 等离子体 physics.pop-ph 科普物理 physics.soc-ph 物理与社会 physics.space-ph 空间物理 quant-ph 量子物理

Q-BIO 定量生物 11

q-bio 定量生物 q-bio.BM 生物分子 q-bio.CB 细胞行为 q-bio.GN 基因组学 q-bio.MN 分子网络 q-bio.NC 神经认知 q-bio.OT 其他定量生物 q-bio.PE 种群进化 q-bio.QM 定量方法 q-bio.SC 亚细胞过程 q-bio.TO 组织器官

Q-FIN 定量金融 10

q-fin 定量金融 q-fin.CP 计算金融 q-fin.EC 经济学 q-fin.GN 一般金融 q-fin.MF 数学金融 q-fin.PM 投资组合 q-fin.PR 证券定价 q-fin.RM 风险管理 q-fin.ST 统计金融 q-fin.TR 交易微观结构

STAT 统计 7

stat 统计 stat.AP 统计应用 stat.CO 统计计算 stat.ME 统计方法 stat.ML 机器学习 stat.OT 其他统计 stat.TH 统计理论

2604.22853 2026-04-28 cs.CV cs.LG

FastAT Benchmark: A Comprehensive Framework for Fair Evaluation of Fast Adversarial Training Methods

Chao Pan, Xin Yao

Comments 9 pages, 2 figures, 3 tables. Code: https://github.com/fzjcdt/FastAT_Benchmark Project page: https://fzjcdt.github.io/FastAT_Benchmark/benchmark.html

2604.22852 2026-04-28 cs.RO cs.AI

SwarmDrive: Semantic V2V Coordination for Latency-Constrained Cooperative Autonomous Driving

Anjie Qiu, Donglin Wang, Zexin Fang, Sanket Partani, Hans D. Schotten

Comments 6 pages, 4 figures, submitted to VTC fall 2026 workshop on W9: 2nd Workshop on Shaping Future Connectivity: Emerging and Intelligent Technologies for Sustainable Vehicular Networks

2604.22851 2026-04-28 cs.CV cs.CL cs.RO

EgoDyn-Bench: Evaluating Ego-Motion Understanding in Vision-Centric Foundation Models for Autonomous Driving

Finn Rasmus Schäfer, Yuan Gao, Dingrui Wang, Thomas Stauner, Stephan Günnemann, Mattia Piccinini, Sebastian Schmidt, Johannes Betz

Comments 36 Pages, under review

2604.22850 2026-04-28 cs.CV cs.LG

Accelerating New Product Introduction for Visual Quality Inspection via Few-Shot Diffusion-Based Defect Synthesis

Serkan Hamdi Güğül, Kemal Levi, Burak Acar

Comments 10 pages, 6 figures. White paper from Relimetrics, Inc

2604.22848 2026-04-28 cs.CV

LunarDepthNet: Generation of Digital Elevation Models using Deep Learning and Monocular Satellite Images

Aaranay Aadi, Jai Gopal Singla, Amitabh, Nitant Dube, Praveen Kumar Shukla, Vijaypal Singh Dhaka

Comments Accepted by IEEE - 4th International Conference on Computer, Electronics, Electrical Engineering and their applications (IC2E3 2026)

2604.22847 2026-04-28 cs.CV

Dream-Cubed: Controllable Generative Modeling in Minecraft by Training on Billions of Cubes

Tim Merino, Sam Earle, Ryunosuke Iwai, Julian Togelius, Edoardo Cetin

2604.22846 2026-04-28 cs.CV

Unified Multi-Foundation-Model Slide Representation for Pan-Cancer Recognition and Text-Guided Tumor Localization

Tianyang Wang, Ziyu Su, Abdul Rehman Akbar, Usama Sajjad, Lina Gokhale, Charles Rabolli, Wei Chen, Anil Parwani, Muhammad Khalid Khan Niazi

2604.22842 2026-04-28 cs.CV eess.IV

EX-FIQA: Leveraging Intermediate Early eXit Representations from Vision Transformers for Face Image Quality Assessment

Guray Ozgur, Tahar Chettaoui, Eduarda Caldeira, Jan Niklas Kolf, Andrea Atzori, Fadi Boutros, Naser Damer

Comments Accepted at FG2026

详情

英文摘要

Face Image Quality Assessment is crucial for reliable face recognition systems, yet existing Vision Transformer-based approaches rely exclusively on final-layer representations, ignoring quality-relevant information captured at intermediate network depths. This paper presents the first comprehensive investigation of how intermediate representations within ViTs contribute to face quality assessment through early exit mechanisms and score fusion strategies. We systematically analyze all twelve transformer blocks of ViT-FIQA architectures, demonstrating that different depths capture distinct and complementary quality-relevant information, as evidenced by varying attention patterns and performance characteristics across network layers. We propose a score fusion framework that combines quality predictions from multiple transformer blocks without architectural modifications or additional training. Our early exit analysis reveals optimal performance-efficiency trade-offs, enabling significant computational savings while maintaining competitive performance. Through extensive evaluation across eight benchmark datasets using four FR models, we demonstrate that our fusion strategy improves upon single-exit approaches. Our proposed quality fusion approach employs depth-weighted averaging that assigns progressively higher importance to deeper transformer blocks, achieving the best quality assessment performance by effectively leveraging the hierarchical nature of feature learning in ViTs. Our work challenges the conventional wisdom that only deep features matter for face analysis, revealing that intermediate representations contain valuable information for quality assessment. The proposed framework offers practical benefits for real-world biometric systems by enabling adaptive computation based on resource constraints while maintaining competitive quality assessment capabilities.

URL PDF HTML ☆

赞 0 踩 0

2604.22841 2026-04-28 cs.CV eess.IV

ATTN-FIQA: Interpretable Attention-based Face Image Quality Assessment with Vision Transformers

Guray Ozgur, Tahar Chettaoui, Eduarda Caldeira, Jan Niklas Kolf, Marco Huber, Andrea Atzori, Naser Damer, Fadi Boutros

Comments Accepted at FG2026

2604.22840 2026-04-28 cs.CV cs.CL cs.MM

AeSlides: Incentivizing Aesthetic Layout in LLM-Based Slide Generation via Verifiable Rewards

Yiming Pan, Chengwei Hu, Xuancheng Huang, Can Huang, Mingming Zhao, Yuean Bi, Xiaohan Zhang, Aohan Zeng, Linmei Hu

Comments 21 pages, 25 figures, 9 tables

详情

英文摘要

Large language models (LLMs) have demonstrated strong potential in agentic tasks, particularly in slide generation. However, slide generation poses a fundamental challenge: the generation process is text-centric, whereas its quality is governed by visual aesthetics. This modality gap leads current models to frequently produce slides with aesthetically suboptimal layouts. Existing solutions typically rely either on heavy visual reflection, which incurs high inference cost yet yields limited gains; or on fine-tuning with large-scale datasets, which still provides weak and indirect aesthetic supervision. In contrast, the explicit use of aesthetic principles as supervision remains unexplored. In this work, we present AeSlides, a reinforcement learning framework with verifiable rewards for Aesthetic layout supervision in Slide generation. We introduce a suite of meticulously designed verifiable metrics to quantify slide layout quality, capturing key layout issues in an accurate, efficient, and low-cost manner. Leveraging these verifiable metrics, we develop a GRPO-based reinforcement learning method that directly optimizes slide generation models for aesthetically coherent layouts. With only 5K training prompts on GLM-4.7-Flash, AeSlides improves aspect ratio compliance from 36% to 85%, while reducing whitespace by 44%, element collisions by 43%, and visual imbalance by 28%. Human evaluation further shows a substantial improvement in overall quality, increasing scores from 3.31 to 3.56 (+7.6%), outperforming both model-based reward optimization and reflection-based agentic approaches, and even edging out Claude-Sonnet-4.5. These results demonstrate that such a verifiable aesthetic paradigm provides an efficient and scalable approach to aligning slide generation with human aesthetic preferences. Our repository is available at https://github.com/ympan0508/aeslides.

URL PDF HTML ☆

赞 0 踩 0

2604.22839 2026-04-28 cs.CV cs.AI

From Skeletons to Pixels: Few-Shot Precise Event Spotting via Representation and Prediction Distillation

Zhong Han Ervin Yeoh, Jiang Kan

Comments 39 pages, 4 figures, ISACE 2026

2604.22838 2026-04-28 cs.CV cs.AI

Neural Network Optimization Reimagined: Decoupled Techniques for Scratch and Fine-Tuning

Xin Ning, Qiankun Li, Xiaolong Huang, Qiupu Chen, Feng He, Weijun Li, Prayag Tiwari, Xinwang Liu

Comments IEEE T=PAMI

Journal ref IEEE T=PAMI 2026

2604.22837 2026-04-28 cs.CV cs.AI

OAMVOS:2nd Report for 5th PVUW MOSE Track

Deshui Miao, Xingsen Huang, Yameng Gu, Xiaogang yu, Xin Li, Ming-Hsuan Yang

2604.22836 2026-04-28 cs.CV

AgentRVOS for MeViS-Text Track of 5th PVUW Challenge: 3rd Method

Deshui Miao, Chao Yang, Chao Tian, Guoqing Zhu, Kai Yang, Zhifan Mo, Xin Li

2604.22835 2026-04-28 cs.CV cs.AI

ParkingScenes: A Structured Dataset for End-to-End Autonomous Parking in Simulation Scenes

Haonan Chen, Kaiwen Xiao, Bin Tian, Jun Fu

Comments Accepted by CAC 2025

详情

英文摘要

Autonomous parking remains a critical yet challenging task in intelligent driving systems, particularly within constrained urban environments where maneuvering space is limited and precise control is essential. While recent advances in end-to-end learning have shown great promise, the lack of high-quality, structured datasets tailored for parking scenarios remains a significant bottleneck.To address this gap, we present ParkingScenes, a comprehensive multimodal dataset specifically designed for end-to-end autonomous parking in simulated scenes. Built on the CARLA simulator, ParkingScenes features structured parking trajectories generated by a Hybrid A* planner and a Model Predictive Controller (MPC), providing accurate and reproducible supervision signals. The dataset includes 16 reverse-in and 6 parallel parking scenarios, each executed under two pedestrian conditions (present and absent), resulting in 704 structured episodes and approximately 105000 frames. Each scenario is repeated 16 times to ensure consistent coverage. Each frame contains synchronized data from four RGB cameras, four depth sensors, vehicle motion states, and Bird's-Eye View (BEV) representations, enabling rich multimodal fusion and context-aware learning. To demonstrate the utility of our dataset, we compare models trained on ParkingScenes with those trained on unstructured, manually collected simulation data under identical conditions. Results show significant improvements in performance, underscoring the effectiveness of structured supervision for robust and accurate parking policy learning. By releasing both the dataset and the collection framework, ParkingScenes establishes a scalable and reproducible benchmark for advancing learning-based autonomous parking systems. The dataset and collection framework will be released at: https://github.com/haonan-ai/ParkingScenes

URL PDF HTML ☆

赞 0 踩 0

2604.22834 2026-04-28 cs.CV cs.LG

WebSerial Vision Training for Microcontrollers: A Browser-Based Companion to On-Device CNN Training

Jeremy Ellis

Comments 29 pages, 16 figures, 5 tables. Paper 2 of the webmcu-ai series. All source code and supplemental results available at: https://github.com/webmcu-ai/webmcu-vision-web

2604.22832 2026-04-28 cs.CV cs.AI cs.LG

Intervention-Aware Multiscale Representation Learning from Imaging Phenomics and Perturbation Transcriptomics

Jiayuan Chen, Ruoqi Liu, Zishan Gu, Ping Zhang

Comments CVPR 2026 main conference

2604.22830 2026-04-28 cs.CV cs.LG

2D Pre-Training for 3D Pose Estimation

Liyao Jiang, Ruichen Chen, Keith G. Mills

Comments This work was completed as a graduate course project more than four years prior to this preprint. It is shared for archival and educational purposes. We open-source our code fork here: https://github.com/ECE740F21T01/pytorch-pose-hg-3d

2604.22829 2026-04-28 cs.CV

Lost in the Vibrations: Vision Language Models Fail the Dynamic Gauges Test

Tairan Fu, Francisco Javier Santos-Martín, Javier Conde, Pedro Reviriego, Elena Merino-Gómez

2604.22828 2026-04-28 cs.CV cs.AI

MetaEarth3D: Unlocking World-scale 3D Generation with Spatially Scalable Generative Modeling

Jinqi Cao, Zhiping Yu, Baihong Lin, Chenyang Liu, Zhenwei Shi, Zhengxia Zou

Comments Project Page: https://jinqicao.github.io/metaearth3d/

2604.22827 2026-04-28 cs.CV cs.LG

DGHMesh: A Large-scale Dual-radar mmWave Dataset and Generalization-focused Benchmark for Human Mesh Reconstruction

Rongxiao Guo, Qingchao Chen

2604.22826 2026-04-28 cs.CV cs.LG

Shape: A Self-Supervised 3D Geometry Foundation Model for Industrial CAD Analysis

Bayangmbe Mounmo, Sam Chien, Mile Mitrovic

Comments 19 pages, 2 figures

2604.22825 2026-04-28 cs.CV cs.AI

SGP-SAM: Self-Gated Prompting for Transferring 3D Segment Anything Models to Lesion Segmentation

Zixuan Tang, Shen Zhao

2604.22823 2026-04-28 cs.CV cs.AI

PivotMerge: Bridging Heterogeneous Multimodal Pre-training via Post-Alignment Model Merging

Zibo Shao, Baochen Xiong, Xiaoshan Yang, Yaguang Song, Qimeng Zhang, Haifeng Chen, Changsheng Xu

2604.22822 2026-04-28 cs.CV cs.AI

DO-Bench: An Attributable Benchmark for Diagnosing Object Hallucination in Vision-Language Models

JiYang Wang, Jiawei Chen, Mengqi Xiao, Yu Cheng, Yangfu Li, Zhaoxia Yin

2604.22808 2026-04-28 cs.CV cs.AI eess.IV

FreqFormer: Hierarchical Frequency-Domain Attention with Adaptive Spectral Routing for Long-Sequence Video Diffusion Transformers

Haopeng Jin

Comments 24 pages, 17 figures, 14 tables, Technical Report

2604.22805 2026-04-28 cs.CV cs.AI cs.SY eess.SY

See No Evil: Semantic Context-Aware Privacy Risk Detection for AR

Jialu Liu, Yao Li, Zhuoheng Li, Huining Li, Ying Chen

Comments Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2026

2604.22787 2026-04-28 cs.LG cs.AI

Conformal PM2.5 Mapping Under Spatial Covariate Shift: Satellite-Reanalysis Fusion for Africa's Green Industrial Transition

Yaw Osei Adjei, Davis Opoku, Ephraim Abotsi, Kwadwo Owusu Amanqua, Oliver Kornyo, Elisha Soglo-Ahianyo, Cephas Anertey Abbey

Comments 9 pages, 8 figures, 6 tables. Index Terms: PM2.5 mapping, conformal prediction, covariate shift, spatial cross-validation, air quality, green industrialisation, trustworthy AI, Africa

2604.22786 2026-04-28 cs.LG

AutoCompress: Critical Layer Isolation for Efficient Transformer Compression

Archit Thorat

Comments 6 pages, 2 tables. Code available at https://github.com/NotDrake100/autocompress

2604.22785 2026-04-28 cs.LG

CoFi-PGMA: Counterfactual Policy Gradients under Filtered Feedback for Multi-Agent LLMs

Stela Tong, Elai Ben-Gal

Comments 17 pages, 0 figures

AI 大模型

视觉与机器人

科学与医疗

FastAT Benchmark: A Comprehensive Framework for Fair Evaluation of Fast Adversarial Training Methods

SwarmDrive: Semantic V2V Coordination for Latency-Constrained Cooperative Autonomous Driving

EgoDyn-Bench: Evaluating Ego-Motion Understanding in Vision-Centric Foundation Models for Autonomous Driving

Accelerating New Product Introduction for Visual Quality Inspection via Few-Shot Diffusion-Based Defect Synthesis

LunarDepthNet: Generation of Digital Elevation Models using Deep Learning and Monocular Satellite Images

Dream-Cubed: Controllable Generative Modeling in Minecraft by Training on Billions of Cubes

Unified Multi-Foundation-Model Slide Representation for Pan-Cancer Recognition and Text-Guided Tumor Localization

EX-FIQA: Leveraging Intermediate Early eXit Representations from Vision Transformers for Face Image Quality Assessment

ATTN-FIQA: Interpretable Attention-based Face Image Quality Assessment with Vision Transformers

AeSlides: Incentivizing Aesthetic Layout in LLM-Based Slide Generation via Verifiable Rewards

From Skeletons to Pixels: Few-Shot Precise Event Spotting via Representation and Prediction Distillation

Neural Network Optimization Reimagined: Decoupled Techniques for Scratch and Fine-Tuning

OAMVOS:2nd Report for 5th PVUW MOSE Track

AgentRVOS for MeViS-Text Track of 5th PVUW Challenge: 3rd Method

ParkingScenes: A Structured Dataset for End-to-End Autonomous Parking in Simulation Scenes

WebSerial Vision Training for Microcontrollers: A Browser-Based Companion to On-Device CNN Training

Intervention-Aware Multiscale Representation Learning from Imaging Phenomics and Perturbation Transcriptomics

2D Pre-Training for 3D Pose Estimation

Lost in the Vibrations: Vision Language Models Fail the Dynamic Gauges Test

MetaEarth3D: Unlocking World-scale 3D Generation with Spatially Scalable Generative Modeling

DGHMesh: A Large-scale Dual-radar mmWave Dataset and Generalization-focused Benchmark for Human Mesh Reconstruction

Shape: A Self-Supervised 3D Geometry Foundation Model for Industrial CAD Analysis

SGP-SAM: Self-Gated Prompting for Transferring 3D Segment Anything Models to Lesion Segmentation

PivotMerge: Bridging Heterogeneous Multimodal Pre-training via Post-Alignment Model Merging

DO-Bench: An Attributable Benchmark for Diagnosing Object Hallucination in Vision-Language Models

FreqFormer: Hierarchical Frequency-Domain Attention with Adaptive Spectral Routing for Long-Sequence Video Diffusion Transformers

See No Evil: Semantic Context-Aware Privacy Risk Detection for AR

Conformal PM2.5 Mapping Under Spatial Covariate Shift: Satellite-Reanalysis Fusion for Africa's Green Industrial Transition

AutoCompress: Critical Layer Isolation for Efficient Transformer Compression

CoFi-PGMA: Counterfactual Policy Gradients under Filtered Feedback for Multi-Agent LLMs