arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

检索范围排序方式

检索时间范围

重置

HOT 人工智能、机器人等 9

cs.AI 人工智能 cs.CV 计算机视觉 cs.CL 自然语言处理 cs.RO 机器人 cs.LG 机器学习 cs.SD 声音 cs.ET 新兴技术 eess.AS 音频语音 eess.IV 图像视频

CS 计算机 41

cs 计算机 cs.AI 人工智能 cs.AR 硬件架构 cs.CC 计算复杂性 cs.CE 计算工程 cs.CG 计算几何 cs.CL 自然语言处理 cs.CR 密码安全 cs.CV 计算机视觉 cs.CY 计算机与社会 cs.DB 数据库 cs.DC 分布式计算 cs.DL 数字图书馆 cs.DM 离散数学 cs.DS 数据结构 cs.ET 新兴技术 cs.FL 形式语言 cs.GL 综述文献 cs.GR 图形学 cs.GT 博弈论 cs.HC 人机交互 cs.IR 信息检索 cs.IT 信息论 cs.LG 机器学习 cs.LO 计算机逻辑 cs.MA 多智能体 cs.MM 多媒体 cs.MS 数学软件 cs.NA 数值分析 cs.NE 神经进化 cs.NI 网络架构 cs.OH 其他计算机 cs.OS 操作系统 cs.PF 性能 cs.PL 编程语言 cs.RO 机器人 cs.SC 符号计算 cs.SD 声音 cs.SE 软件工程 cs.SI 社会信息网络 cs.SY 系统控制

ECON 经济学 4

econ 经济学 econ.EM 计量经济 econ.GN 一般经济 econ.TH 理论经济

EESS 电气与系统 5

eess 电气与系统 eess.AS 音频语音 eess.IV 图像视频 eess.SP 信号处理 eess.SY 系统控制

MATH 数学 33

math 数学 math.AC 交换代数 math.AG 代数几何 math.AP 偏微分方程 math.AT 代数拓扑 math.CA 经典分析 math.CO 组合数学 math.CT 范畴论 math.CV 复变函数 math.DG 微分几何 math.DS 动力系统 math.FA 泛函分析 math.GM 一般数学 math.GN 一般拓扑 math.GR 群论 math.GT 几何拓扑 math.HO 历史综述 math.IT 信息论 math.KT K理论 math.LO 逻辑 math.MG 度量几何 math.MP 数学物理 math.NA 数值分析 math.NT 数论 math.OA 算子代数 math.OC 优化控制 math.PR 概率 math.QA 量子代数 math.RA 环与代数 math.RT 表示论 math.SG 辛几何 math.SP 谱理论 math.ST 统计理论

PHYSICS 物理 55

astro-ph 天体物理 astro-ph.CO 宇宙学 astro-ph.EP 地球行星 astro-ph.GA 星系物理 astro-ph.HE 高能天体 astro-ph.IM 天文仪器 astro-ph.SR 太阳恒星 cond-mat 凝聚态 cond-mat.dis-nn 无序神经 cond-mat.mes-hall 介观纳米 cond-mat.mtrl-sci 材料科学 cond-mat.other 其他凝聚态 cond-mat.quant-gas 量子气体 cond-mat.soft 软凝聚态 cond-mat.stat-mech 统计力学 cond-mat.str-el 强关联电子 cond-mat.supr-con 超导 gr-qc 广义相对论 hep-ex 高能实验 hep-lat 格点高能 hep-ph 高能唯象 hep-th 高能理论 math-ph 数学物理 nlin 非线性科学 nlin.AO 自适应系统 nlin.CD 混沌动力学 nlin.CG 胞自动机 nlin.PS 斑图孤子 nlin.SI 可积系统 nucl-ex 核物理实验 nucl-th 核物理理论 physics 物理 physics.acc-ph 加速器物理 physics.ao-ph 大气海洋 physics.app-ph 应用物理 physics.atm-clus 原子分子团簇 physics.atom-ph 原子物理 physics.bio-ph 生物物理 physics.chem-ph 化学物理 physics.class-ph 经典物理 physics.comp-ph 计算物理 physics.data-an 数据分析 physics.ed-ph 物理教育 physics.flu-dyn 流体动力学 physics.gen-ph 普通物理 physics.geo-ph 地球物理 physics.hist-ph 物理史哲 physics.ins-det 仪器探测 physics.med-ph 医学物理 physics.optics 光学 physics.plasm-ph 等离子体 physics.pop-ph 科普物理 physics.soc-ph 物理与社会 physics.space-ph 空间物理 quant-ph 量子物理

Q-BIO 定量生物 11

q-bio 定量生物 q-bio.BM 生物分子 q-bio.CB 细胞行为 q-bio.GN 基因组学 q-bio.MN 分子网络 q-bio.NC 神经认知 q-bio.OT 其他定量生物 q-bio.PE 种群进化 q-bio.QM 定量方法 q-bio.SC 亚细胞过程 q-bio.TO 组织器官

Q-FIN 定量金融 10

q-fin 定量金融 q-fin.CP 计算金融 q-fin.EC 经济学 q-fin.GN 一般金融 q-fin.MF 数学金融 q-fin.PM 投资组合 q-fin.PR 证券定价 q-fin.RM 风险管理 q-fin.ST 统计金融 q-fin.TR 交易微观结构

STAT 统计 7

stat 统计 stat.AP 统计应用 stat.CO 统计计算 stat.ME 统计方法 stat.ML 机器学习 stat.OT 其他统计 stat.TH 统计理论

2311.08007 2026-03-03 cs.CV

Velocity Disambiguation for Video Frame Interpolation

Zhihang Zhong, Yiming Zhang, Wei Wang, Xiao Sun, Yu Qiao, Gurunandan Krishnan, Sizhuo Ma, Jian Wang

Comments ECCV2024 Oral; TPAMI

详情

英文摘要

Existing video frame interpolation (VFI) methods blindly predict where each object is at a specific timestep t ("time indexing"), which struggles to predict precise object movements. Given two images of a baseball, there are infinitely many possible trajectories: accelerating or decelerating, straight or curved. This often results in blurry frames as the method averages out these possibilities. Instead of forcing the network to learn this complicated time-to-location mapping implicitly, we provide the network with an explicit hint on how far the object has traveled between start and end frames, a novel approach termed "distance indexing". This method offers a clearer learning goal for models, reducing the uncertainty tied to object speeds. Moreover, even with this extra guidance, objects can still be blurry especially when they are equally far from both input frames, due to the directional ambiguity in long-range motion. To solve this, we propose an iterative reference-based estimation strategy that breaks down a long-range prediction into several short-range steps. When integrating our plug-and-play strategies into state-of-the-art learning-based models, they exhibit markedly superior perceptual quality in arbitrary time interpolations, using a uniform distance indexing map in the same format as time indexing without requiring extra computation. Furthermore, we demonstrate that if additional latency is acceptable, a continuous map estimator can be employed to compute a pixel-wise dense distance indexing using multiple nearby frames. Combined with efficient multi-frame refinement, this extension can further disambiguate complex motion, thus enhancing performance both qualitatively and quantitatively. Additionally, the ability to manually specify distance indexing allows for independent temporal manipulation of each object, providing a novel tool for video editing tasks such as re-timing.

URL PDF HTML ☆

赞 0 踩 0

2310.05189 2026-03-03 cs.CL cs.AI cs.LG

Factuality Challenges in the Era of Large Language Models

Isabelle Augenstein, Timothy Baldwin, Meeyoung Cha, Tanmoy Chakraborty, Giovanni Luca Ciampaglia, David Corney, Renee DiResta, Emilio Ferrara, Scott Hale, Alon Halevy, Eduard Hovy, Heng Ji, Filippo Menczer, Ruben Miguez, Preslav Nakov, Dietram Scheufele, Shivam Sharma, Giovanni Zagni

Comments Our article offers a comprehensive examination of the challenges and risks associated with Large Language Models (LLMs), focusing on their potential impact on the veracity of information in today's digital landscape

Journal ref Nat Mach Intell 6, 852--863 (2024)

2304.03198 2026-03-03 cs.CV

RFAConv: Receptive-Field Attention Convolution for Improving Convolutional Neural Networks

Xin Zhang, Chen Liu, Degang Yang, Tingting Song, Yichen Ye, Ke Li, Yingze Song

Journal ref Pattern Recognition, 113208(2026)

详情

DOI: 10.1016/j.patcog.2026.113208

英文摘要

In the realm of deep learning, spatial attention mechanisms have emerged as a vital method for enhancing the performance of convolutional neural networks. However, these mechanisms possess inherent limitations that cannot be overlooked. This work delves into the mechanism of spatial attention and reveals a new insight. It is that the mechanism essentially addresses the issue of convolutional parameter sharing. By addressing this issue, the convolutional kernel can efficiently extract features by employing varying weights at distinct locations. However, current spatial attention mechanisms focus on shallow attention to spatial features, which is insufficient to address the fundamental challenge of parameter sharing in convolutions involving larger kernels. In response to this challenge, we introduce a novel attention mechanism known as Receptive-Field Attention (RFA). Compared to existing spatial attention methods, RFA not only concentrates on the receptive-field spatial features but also offers effective attention weights for large convolutional kernels. Building upon the RFA concept, a Receptive-Field Attention Convolution (RFAConv) is proposed to supplant the conventional standard convolution. Notably, it offers nearly negligible increment of computational overhead and parameters, while significantly improving network performance. Furthermore, this work reveals that current spatial attention mechanisms require enhanced prioritization of receptive-field spatial features to optimize network performance. To validate the advantages of the proposed methods, we conduct many experiments across several authoritative datasets, including ImageNet, COCO, VOC, and Roboflow...

URL PDF HTML ☆

赞 0 踩 0

2008.07294 2026-03-03 cs.CV

AP-Loss for Accurate One-Stage Object Detection

Kean Chen, Weiyao Lin, Jianguo Li, John See, Ji Wang, Junni Zou

Comments Accepted to IEEE TPAMI. arXiv admin note: substantial text overlap with arXiv:1904.06373

Journal ref IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(11), 3782-3798, 2021

1904.06373 2026-03-03 cs.CV

Towards Accurate One-Stage Object Detection with AP-Loss

Kean Chen, Jianguo Li, Weiyao Lin, John See, Ji Wang, Lingyu Duan, Zhibo Chen, Changwei He, Junni Zou

Comments 13 pages, 7 figures, 4 tables, main paper + supplementary material, accepted to CVPR 2019

Journal ref IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2019)

2603.00217 2026-03-03 cs.CV cs.AI cs.CR

Physical Evaluation of Naturalistic Adversarial Patches for Camera-Based Traffic-Sign Detection

Brianna D'Urso, Tahmid Hasan Sakib, Syed Rafay Hasan, Terry N. Guo

Comments Accepted to the 2nd IEEE Conference on Secure and Trustworthy CyberInfrastructure for IoT and Microelectronics (SaTC 2026), Houston, Texas, USA, March 24 to 26, 2026

2603.00207 2026-03-03 cs.CV cs.AI

VisRef: Visual Refocusing while Thinking Improves Test-Time Scaling in Multi-Modal Large Reasoning Models

Soumya Suvra Ghosal, Youngeun Kim, Zhuowei Li, Ritwick Chaudhry, Linghan Xu, Hongjing Zhang, Jakub Zablocki, Yifan Xing, Qin Zhang

Comments Accepted to CVPR 2026

2603.00206 2026-03-03 cs.CV cs.AI

TACIT Benchmark: A Programmatic Visual Reasoning Benchmark for Generative and Discriminative Models

Daniel Nobrega Medeiros

Comments 10 pages, 4 figures, 5 tables

2603.00201 2026-03-03 cs.CV cs.AI

AdURA-Net: Adaptive Uncertainty and Region-Aware Network

Antik Aich Roy, Ujjwal Bhattacharya

2603.00198 2026-03-03 cs.CV cs.AI

Stateful Token Reduction for Long-Video Hybrid VLMs

Jindong Jiang, Amala Sanjay Deshmukh, Kateryna Chumachenko, Karan Sapra, Zhiding Yu, Guilin Liu, Andrew Tao, Pavlo Molchanov, Jan Kautz, Wonmin Byeon

2603.00197 2026-03-03 cs.CV cs.AI

A Case Study on Concept Induction for Neuron-Level Interpretability in CNN

Moumita Sen Sarma, Samatha Ereshi Akkamahadevi, Pascal Hitzler

2603.00194 2026-03-03 cs.CV cs.AI cs.CR

SKeDA: A Generative Watermarking Framework for Text-to-video Diffusion Models

Yang Yang, Xinze Zou, Zehua Ma, Han Fang, Weiming Zhang

Comments 11 pages, 6 figures

2603.00190 2026-03-03 cs.LG cs.AI

OSF: On Pre-training and Scaling of Sleep Foundation Models

Zitao Shuai, Zongzhe Xu, David Yang, Wei Wang, Yuzhe Yang

2603.00188 2026-03-03 cs.CV cs.AI cs.LG

Efficient Long-Horizon GUI Agents via Training-Free KV Cache Compression

Bowen Zhou, Zhou Xu, Wanli Li, Jingyu Xiao, Haoqian Wang

2603.00182 2026-03-03 cs.RO cs.AI cs.LG cs.SY eess.SY

Embedding Morphology into Transformers for Cross-Robot Policy Learning

Kei Suzuki, Jing Liu, Ye Wang, Chiori Hori, Matthew Brand, Diego Romeres, Toshiaki Koike-Akino

Comments 17 pages, 8 figures (including appendix)

2603.00181 2026-03-03 cs.LG cs.AI cs.SE

Engineering FAIR Privacy-preserving Applications that Learn Histories of Disease

Ines N. Duarte, Praphulla M. S. Bhawsar, Lee K. Mason, Jeya Balaji Balasubramanian, Daniel E. Russ, Arlindo L. Oliveira, Jonas S. Almeida

2603.00180 2026-03-03 cs.LG cs.AI

NNiT: Width-Agnostic Neural Network Generation with Structurally Aligned Weight Spaces

Jiwoo Kim, Swarajh Mehta, Hao-Lun Hsu, Hyunwoo Ryu, Yudong Liu, Miroslav Pajic

2603.00176 2026-03-03 cs.LG cs.AI

Bridging Policy and Real-World Dynamics: LLM-Augmented Rebalancing for Shared Micromobility Systems

Heng Tan, Hua Yan, Yu Yang

Comments 8 pages, 7 figures, accepted by ICRA 2026

2603.00173 2026-03-03 cs.CV cs.AI cs.LG

Summer-22B: A Systematic Approach to Dataset Engineering and Training at Scale for Video Foundation Model

Simo Ryu, Chunghwan Han

Comments 28 pages, 16 figures, 5 tables

2603.00168 2026-03-03 cs.CV

Image-Based Classification of Olive Species Specific to Turkiye with Deep Neural Networks

Irfan Atabas, Hatice Karatas

2603.00165 2026-03-03 cs.CV

ConFoThinking: Consolidated Focused Attention Driven Thinking for Visual Question Answering

Zhaodong Wu, Haochen Xue, Qi Cao, Wenqi Mo, Yu Pei, Wenqi Xu, Jionglong Su, Yang Liu

2603.00163 2026-03-03 cs.CV cs.LG

A Boundary-Metric Evaluation Protocol for Whiteboard Stroke Segmentation Under Extreme Imbalance

Nicholas Korcynski

Comments 10 pages, 8 figures. Preprint

2603.00161 2026-03-03 cs.CV cs.LG

SKINOPATHY AI: Smartphone-Based Ophthalmic Screening and Longitudinal Tracking Using Lightweight Computer Vision

S. Kalaycioglu, C. Hong, M. Zhu, H. Xie

Comments 25 pages , 7 figures, 5 tables

2603.00160 2026-03-03 cs.CV cs.AI

DINOv3 Meets YOLO26 for Weed Detection in Vegetable Crops

Boyang Deng, Yuzhen Lu

Comments 10 pages, 2 figures

2603.00159 2026-03-03 cs.CV cs.AI cs.MM cs.SD

FlowPortrait: Reinforcement Learning for Audio-Driven Portrait Video Generation

Weiting Tan, Andy T. Liu, Ming Tu, Xinghua Qu, Philipp Koehn, Lu Lu

2603.00157 2026-03-03 cs.CV

FujiView: Multimodal Late-Fusion for Predicting Scenic Visibility

Bryceton Bible, Shah Md Nehal Hasnaeen, Hairong Qi

Comments 9 pages (including references), 8 figures, 2 tables. Accepted to the IEEE/CVF WACV 2026 proceedings. Introduces a large human-labeled Mount Fuji visibility dataset; public release forthcoming

2603.00156 2026-03-03 cs.CV

BiCLIP: Bidirectional and Consistent Language-Image Processing for Robust Medical Image Segmentation

Saivan Talaei, Fatemeh Daneshfar, Abdulhady Abas Abdullah, Mustaqeem Khan

2603.00155 2026-03-03 cs.CV cs.AI cs.IR

EfficientPosterGen: Semantic-aware Efficient Poster Generation via Token Compression and Accurate Violation Detection

Wenxin Tang, Jingyu Xiao, Yanpei Gong, Fengyuan Ran, Tongchuan Xia, Junliang Liu, Man Ho Lam, Wenxuan Wang, Michael R. Lyu

2603.00151 2026-03-03 cs.RO cs.CV

Multiview Progress Prediction of Robot Activities

Elena Zoppellari, Federico Becattini, Marco Fiorucci, Lamberto Ballan

Comments Accepted at ICASSP 2026

2603.00150 2026-03-03 cs.CV cs.CY

Attention to Neural Plagiarism: Diffusion Models Can Plagiarize Your Copyrighted Images!

Zihang Zou, Boqing Gong, Liqiang Wang

Comments Accepted to ICCV 2025. Code available at: https://github.com/zzzucf/Neural-Plagiarism

AI 大模型

视觉与机器人

科学与医疗