arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

检索范围排序方式

检索时间范围

重置

HOT 人工智能、机器人等 9

cs.AI 人工智能 cs.CV 计算机视觉 cs.CL 自然语言处理 cs.RO 机器人 cs.LG 机器学习 cs.SD 声音 cs.ET 新兴技术 eess.AS 音频语音 eess.IV 图像视频

CS 计算机 41

cs 计算机 cs.AI 人工智能 cs.AR 硬件架构 cs.CC 计算复杂性 cs.CE 计算工程 cs.CG 计算几何 cs.CL 自然语言处理 cs.CR 密码安全 cs.CV 计算机视觉 cs.CY 计算机与社会 cs.DB 数据库 cs.DC 分布式计算 cs.DL 数字图书馆 cs.DM 离散数学 cs.DS 数据结构 cs.ET 新兴技术 cs.FL 形式语言 cs.GL 综述文献 cs.GR 图形学 cs.GT 博弈论 cs.HC 人机交互 cs.IR 信息检索 cs.IT 信息论 cs.LG 机器学习 cs.LO 计算机逻辑 cs.MA 多智能体 cs.MM 多媒体 cs.MS 数学软件 cs.NA 数值分析 cs.NE 神经进化 cs.NI 网络架构 cs.OH 其他计算机 cs.OS 操作系统 cs.PF 性能 cs.PL 编程语言 cs.RO 机器人 cs.SC 符号计算 cs.SD 声音 cs.SE 软件工程 cs.SI 社会信息网络 cs.SY 系统控制

ECON 经济学 4

econ 经济学 econ.EM 计量经济 econ.GN 一般经济 econ.TH 理论经济

EESS 电气与系统 5

eess 电气与系统 eess.AS 音频语音 eess.IV 图像视频 eess.SP 信号处理 eess.SY 系统控制

MATH 数学 33

math 数学 math.AC 交换代数 math.AG 代数几何 math.AP 偏微分方程 math.AT 代数拓扑 math.CA 经典分析 math.CO 组合数学 math.CT 范畴论 math.CV 复变函数 math.DG 微分几何 math.DS 动力系统 math.FA 泛函分析 math.GM 一般数学 math.GN 一般拓扑 math.GR 群论 math.GT 几何拓扑 math.HO 历史综述 math.IT 信息论 math.KT K理论 math.LO 逻辑 math.MG 度量几何 math.MP 数学物理 math.NA 数值分析 math.NT 数论 math.OA 算子代数 math.OC 优化控制 math.PR 概率 math.QA 量子代数 math.RA 环与代数 math.RT 表示论 math.SG 辛几何 math.SP 谱理论 math.ST 统计理论

PHYSICS 物理 55

astro-ph 天体物理 astro-ph.CO 宇宙学 astro-ph.EP 地球行星 astro-ph.GA 星系物理 astro-ph.HE 高能天体 astro-ph.IM 天文仪器 astro-ph.SR 太阳恒星 cond-mat 凝聚态 cond-mat.dis-nn 无序神经 cond-mat.mes-hall 介观纳米 cond-mat.mtrl-sci 材料科学 cond-mat.other 其他凝聚态 cond-mat.quant-gas 量子气体 cond-mat.soft 软凝聚态 cond-mat.stat-mech 统计力学 cond-mat.str-el 强关联电子 cond-mat.supr-con 超导 gr-qc 广义相对论 hep-ex 高能实验 hep-lat 格点高能 hep-ph 高能唯象 hep-th 高能理论 math-ph 数学物理 nlin 非线性科学 nlin.AO 自适应系统 nlin.CD 混沌动力学 nlin.CG 胞自动机 nlin.PS 斑图孤子 nlin.SI 可积系统 nucl-ex 核物理实验 nucl-th 核物理理论 physics 物理 physics.acc-ph 加速器物理 physics.ao-ph 大气海洋 physics.app-ph 应用物理 physics.atm-clus 原子分子团簇 physics.atom-ph 原子物理 physics.bio-ph 生物物理 physics.chem-ph 化学物理 physics.class-ph 经典物理 physics.comp-ph 计算物理 physics.data-an 数据分析 physics.ed-ph 物理教育 physics.flu-dyn 流体动力学 physics.gen-ph 普通物理 physics.geo-ph 地球物理 physics.hist-ph 物理史哲 physics.ins-det 仪器探测 physics.med-ph 医学物理 physics.optics 光学 physics.plasm-ph 等离子体 physics.pop-ph 科普物理 physics.soc-ph 物理与社会 physics.space-ph 空间物理 quant-ph 量子物理

Q-BIO 定量生物 11

q-bio 定量生物 q-bio.BM 生物分子 q-bio.CB 细胞行为 q-bio.GN 基因组学 q-bio.MN 分子网络 q-bio.NC 神经认知 q-bio.OT 其他定量生物 q-bio.PE 种群进化 q-bio.QM 定量方法 q-bio.SC 亚细胞过程 q-bio.TO 组织器官

Q-FIN 定量金融 10

q-fin 定量金融 q-fin.CP 计算金融 q-fin.EC 经济学 q-fin.GN 一般金融 q-fin.MF 数学金融 q-fin.PM 投资组合 q-fin.PR 证券定价 q-fin.RM 风险管理 q-fin.ST 统计金融 q-fin.TR 交易微观结构

STAT 统计 7

stat 统计 stat.AP 统计应用 stat.CO 统计计算 stat.ME 统计方法 stat.ML 机器学习 stat.OT 其他统计 stat.TH 统计理论

2512.07155 2026-03-25 cs.CV

CHIMERA: Adaptive Cache Injection and Semantic Anchor Prompting for Zero-shot Image Morphing with Morphing-oriented Metrics

Dahyeon Kye, Jeahun Sung, Minkyu Jeon, Jihyong Oh

Comments Please visit our project page at https://cmlab-korea.github.io/CHIMERA/

2512.06796 2026-03-25 cs.RO

db-LaCAM: Fast and Scalable Multi-Robot Kinodynamic Motion Planning with Discontinuity-Bounded Search and Lightweight MAPF

Akmaral Moldagalieva, Keisuke Okumura, Amanda Prorok, Wolfgang Hönig

2512.06737 2026-03-25 cs.LG cs.AI cs.CL cs.CV cs.NE

Arc Gradient Descent: A Geometrically Motivated Gradient Descent-based Optimiser with Phase-Aware, User-Controlled Step Dynamics (proof-of-concept)

Nikhil Verma, Joonas Linnosmaa, Leonardo Espinosa-Leal, Napat Vajragupta

Comments 90 pages, 6 appendices, proof-of-concept

详情

英文摘要

The paper presents the formulation, implementation, and evaluation of the ArcGD optimiser. The evaluation is conducted initially on a non-convex benchmark function and subsequently on a real-world ML dataset. The initial comparative study using the Adam optimiser is conducted on a stochastic variant of the highly non-convex and notoriously challenging Rosenbrock function, renowned for its narrow, curved valley, across dimensions ranging from 2D to 1000D and an extreme case of 50,000D. Two configurations were evaluated to eliminate learning-rate bias: (i) both using ArcGD's effective learning rate and (ii) both using Adam's default learning rate. ArcGD consistently outperformed Adam under the first setting and, although slower under the second, achieved superior final solutions in most cases. In the second evaluation, ArcGD is evaluated against state-of-the-art optimizers (Adam, AdamW, Lion, SGD) on the CIFAR-10 image classification dataset across 8 diverse MLP architectures ranging from 1 to 5 hidden layers. ArcGD achieved the highest average test accuracy (50.7%) at 20,000 iterations, outperforming AdamW (46.6%), Adam (46.8%), SGD (49.6%), and Lion (43.4%), winning or tying on 6 of 8 architectures. Notably, while Adam and AdamW showed strong early convergence at 5,000 iterations, but regressed with extended training, whereas ArcGD continued improving, demonstrating generalization and resistance to overfitting without requiring early stopping tuning. Strong performance on geometric stress tests and standard deep-learning benchmarks indicates broad applicability, highlighting the need for further exploration. Moreover, it is also shown that both a limiting variant of ArcGD and a momentum augmented ArcGD, recover sign-based momentum updates, revealing a clear conceptual link between ArcGD's phase structure and the core mechanism of the Lion Optimiser.

URL PDF HTML ☆

赞 0 踩 0

2512.04165 2026-03-25 cs.LG stat.ML

Mitigating the Curse of Detail: Scaling Arguments for Feature Learning and Sample Complexity

Noa Rubin, Orit Davidovich, Zohar Ringel

2512.03405 2026-03-25 cs.CV

ViDiC: Video Difference Captioning

Jiangtao Wu, Shihao Li, Zhaozhou Bian, Jialu Chen, Runzhe Wen, An Ping, Yiwen He, Jiakai Wang, Yuanxing Zhang, Jiaheng Liu

2512.03044 2026-03-25 cs.RO

Video2Act: A Dual-System Video Diffusion Policy with Robotic Spatio-Motional Modeling

Yueru Jia, Jiaming Liu, Shengbang Liu, Rui Zhou, Wanhe Yu, Yuyang Yan, Xiaowei Chi, Yandong Guo, Boxin Shi, Shanghang Zhang

2512.02982 2026-03-25 cs.CV cs.RO

U4D: Uncertainty-Aware 4D World Modeling from LiDAR Sequences

Xiang Xu, Alan Liang, Youquan Liu, Linfeng Li, Lingdong Kong, Ziwei Liu, Qingshan Liu

Comments CVPR 2026; 20 pages, 7 figures, 11 tables; Code at https://github.com/worldbench/U4D

2512.02505 2026-03-25 cs.CV

GeoDiT: A Diffusion-based Vision-Language Model for Geospatial Understanding

Jiaqi Liu, Ronghao Fu, Haoran Liu, Lang Sun, Bo Yang

2512.02487 2026-03-25 cs.CV cs.AI

Masking Matters: Unlocking the Spatial Reasoning Capabilities of LLMs for 3D Scene-Language Understanding

Yerim Jeon, Miso Lee, WonJun Moon, Jae-Pil Heo

Comments Accepted to CVPR 2026. GitHub Page: https://github.com/Jyerim/3D-SLIM

2512.02448 2026-03-25 cs.CV cs.RO

nuScenes Revisited: Progress and Challenges in Autonomous Driving

Whye Kit Fong, Venice Erin Liong, Kok Seang Tan, Holger Caesar

Comments 18 pages, 17 figures

2512.01248 2026-03-25 cs.CV

TRivia: Self-supervised Fine-tuning of Vision-Language Models for Table Recognition

Junyuan Zhang, Bin Wang, Qintong Zhang, Fan Wu, Zichen Wen, Jialin Lu, Junjie Shan, Ziqi Zhao, Shuya Yang, Ziling Wang, Ziyang Miao, Huaping Zhong, Yuhang Zang, Xiaoyi Dong, Ka-Ho Chow, Conghui He

Comments Accepted by CVPR 2026

2511.22815 2026-03-25 cs.CV

Captain Safari: A World Engine with Pose-Aligned 3D Memory

Yu-Cheng Chou, Xingrui Wang, Yitong Li, Jiahao Wang, Hanting Liu, Cihang Xie, Alan Yuille, Junfei Xiao

2511.22076 2026-03-25 cs.AI

Hybrid Stackelberg Game and Diffusion-based Auction for Two-tier Agentic AI Task Offloading in Internet of Agents

Yue Zhong, Yongju Tong, Jiawen Kang, Minghui Dai, Hong-Ning Dai, Zhou Su, Dusit Niyato

Comments Revisions are needed

2511.21732 2026-03-25 cs.CL cs.AI

HUMORCHAIN: Theory-Guided Multi-Stage Reasoning for Interpretable Multimodal Humor Generation

Jiajun Zhang, Shijia Luo, Ruikang Zhang, Qi Su

2511.20996 2026-03-25 cs.CV

From Inpainting to Layer Decomposition: Repurposing Generative Inpainting Models for Image Layer Decomposition

Jingxi Chen, Yixiao Zhang, Xiaoye Qian, Zongxia Li, Cornelia Fermuller, Caren Chen, Yiannis Aloimonos

Comments Accepted by CVPR 2026

2511.20565 2026-03-25 cs.CV

DINO-Tok: Adapting DINO for Visual Tokenizers

Mingkai Jia, Mingxiao Li, Zhijian Shu, Anlin Zheng, Liaoyuan Fan, Jiaxin Guo, Tianxing Shi, Dongyue Lu, Zeming Li, Xiaoyang Guo, Xiaojuan Qi, Xiao-Xiao Long, Qian Zhang, Ping Tan, Wei Yin

2511.20515 2026-03-25 cs.CV

HalDec-Bench: Benchmarking Hallucination Detector in Image Captioning

Kuniaki Saito, Risa Shinoda, Shohei Tanaka, Tosho Hirasawa, Fumio Okura, Yoshitaka Ushiku

Comments Previously this version appeared as arXiv:2603.15253 which was submitted as a new work by accident

2511.20008 2026-03-25 cs.CV cs.AI

Pedestrian Crossing Intention Prediction Using Multimodal Fusion Network

Yuanzhe Li, Steffen Müller

Comments 29th IAVSD International Symposium on Dynamics of Vehicles on Roads and Tracks (IAVSD 2025)

2511.17181 2026-03-25 cs.CV cs.LG cs.SD

Investigating self-supervised representations for audio-visual deepfake detection

Dragos-Alexandru Boldisor, Stefan Smeu, Dan Oneata, Elisabeta Oneata

Comments Accepted to the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026

2511.16105 2026-03-25 cs.LG

Data-Efficient and Robust Trajectory Generation through Pathlet Dictionary Learning

Yuanbo Tang, Yan Tang, Zixuan Zhang, Zihui Zhao, Yang Li

2511.14440 2026-03-25 cs.CV

Learning to See Through a Baby's Eyes: Early Visual Diets Enable Robust Visual Intelligence in Humans and Machines

Yusen Cai, Qing Lin, Bhargava Satya Nunna, Mengmi Zhang

2511.12449 2026-03-25 cs.CV cs.AI cs.IR cs.LG

MOON2.0: Dynamic Modality-balanced Multimodal Representation Learning for E-commerce Product Understanding

Zhanheng Nie, Chenghan Fu, Daoze Zhang, Junxian Wu, Wanxian Guan, Pengjie Wang, Jian Xu, Bo Zheng

Comments 11 pages, 7 figures

2511.09211 2026-03-25 cs.LG

Parameter-Free Clustering via Self-Supervised Consensus Maximization (Extended Version)

Lijun Zhang, Suyuan Liu, Siwei Wang, Shengju Yu, Xueling Zhu, Miaomiao Li, Xinwang Liu

Comments Accept by AAAI 2026

2511.07808 2026-03-25 cs.CV

DI3CL: Contrastive Learning With Dynamic Instances and Contour Consistency for SAR Land-Cover Classification Foundation Model

Zhongle Ren, Hui Ding, Kai Wang, Biao Hou, Xingyu Luo, Weibin Li, Licheng Jiao

Comments 16 pages, 7 figures;Accepted for publication in IEEE Transactions on Image Processing (TIP)

2511.07719 2026-03-25 cs.AI cs.CV

Operational machine learning for remote spectroscopic detection of CH$_{4}$ point sources

Vít Růžička, Gonzalo Mateo-García, Itziar Irakulis-Loitxate, Juan Emmanuel Johnson, Manuel Montesino San Martín, Anna Allen, Alma Raunak, Carol Castaneda, Luis Guanter, David R. Thompson

Comments 20 pages, 14 figures, 10 tables. In review

详情

英文摘要

Mitigating anthropogenic methane sources is one of the most cost-effective levers to slow down global warming. While satellite-based imaging spectrometers, such as EMIT, PRISMA, and EnMAP, can detect these point sources, current methane retrieval methods based on matched filters produce a high number of false detections requiring manual verification. To address this challenge, we deployed a ML system for detecting methane emissions within the Methane Alert and Response System (MARS) of UNEP's IMEO. This represents the first operational deployment of automated methane point-source detection using spaceborne imaging spectrometers, providing regular global coverage and scalability to future constellations with even higher data volumes. This task required several technical advances. First, we created one of the largest and most diverse and global ML ready datasets to date of annotated methane plumes from three imaging spectrometer missions, and quantitatively compared different deep learning model configurations. Second, we extended prior evaluation methodologies from small, tiled datasets to full granules that are more representative of operational use. This revealed that deep learning models still produce a large number of false detections, a problem we addressed with model ensembling, which reduced false detections by over 74%. During 11 months of operational deployment, our system processed more than 25,000 hyperspectral products faciliting the verification of 2,851 distinct methane leaks, which resulted in 834 stakeholder notifications. We further demonstrate the model's utility in verifying mitigation success through case studies in Libya, Argentina, Oman, and Azerbaijan. Our work represents a critical step towards a global AI-assisted methane leak detection system, which is required to process the dramatically higher data volumes expected from current and future imaging spectrometers.

URL PDF HTML ☆

赞 0 踩 0

2511.06901 2026-03-25 cs.CV

Classification of Microplastic Particles in Water using Polarized Light Scattering and Machine Learning Methods

Leonard Saur, Marc von Pawlowski, Ulrich Gengenbach, Ingo Sieber, Hossein Shirali, Lorenz Wührl, Xiangyu Weng, Rainer Kiko, Christian Pylatiuk

Comments 22 pages, 9 figures

2511.05876 2026-03-25 cs.CV cs.LG

MoEGCL: Mixture of Ego-Graphs Contrastive Representation Learning for Multi-View Clustering

Jian Zhu, Xin Zou, Jun Sun, Cheng Luo, Lei Liu, Lingfang Zeng, Ning Zhang, Bian Wu, Chang Tang, Lirong Dai

2511.03334 2026-03-25 cs.CV

UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions

Guozhen Zhang, Zixiang Zhou, Teng Hu, Ziqiao Peng, Youliang Zhang, Yi Chen, Yuan Zhou, Qinglin Lu, Limin Wang

Comments CVPR 2026

2510.26865 2026-03-25 cs.CV cs.AI

Do Vision-Language Models Measure Up? Benchmarking Visual Measurement Reading with MeasureBench

Fenfen Lin, Yesheng Liu, Haiyu Xu, Chen Yue, Zheqi He, Mingxuan Zhao, Miguel Hu Chen, Jiakang Liu, JG Yao, Xi Yang

Comments Project page: https://flageval-baai.github.io/MeasureBenchPage/

2510.21356 2026-03-25 cs.CV cs.AI

Gaze-VLM:Bridging Gaze and VLMs through Attention Regularization for Egocentric Understanding

Anupam Pani, Yanchao Yang

AI 大模型

视觉与机器人

科学与医疗