arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

检索范围排序方式

检索时间范围

重置

HOT 人工智能、机器人等 9

cs.AI 人工智能 cs.CV 计算机视觉 cs.CL 自然语言处理 cs.RO 机器人 cs.LG 机器学习 cs.SD 声音 cs.ET 新兴技术 eess.AS 音频语音 eess.IV 图像视频

CS 计算机 41

cs 计算机 cs.AI 人工智能 cs.AR 硬件架构 cs.CC 计算复杂性 cs.CE 计算工程 cs.CG 计算几何 cs.CL 自然语言处理 cs.CR 密码安全 cs.CV 计算机视觉 cs.CY 计算机与社会 cs.DB 数据库 cs.DC 分布式计算 cs.DL 数字图书馆 cs.DM 离散数学 cs.DS 数据结构 cs.ET 新兴技术 cs.FL 形式语言 cs.GL 综述文献 cs.GR 图形学 cs.GT 博弈论 cs.HC 人机交互 cs.IR 信息检索 cs.IT 信息论 cs.LG 机器学习 cs.LO 计算机逻辑 cs.MA 多智能体 cs.MM 多媒体 cs.MS 数学软件 cs.NA 数值分析 cs.NE 神经进化 cs.NI 网络架构 cs.OH 其他计算机 cs.OS 操作系统 cs.PF 性能 cs.PL 编程语言 cs.RO 机器人 cs.SC 符号计算 cs.SD 声音 cs.SE 软件工程 cs.SI 社会信息网络 cs.SY 系统控制

ECON 经济学 4

econ 经济学 econ.EM 计量经济 econ.GN 一般经济 econ.TH 理论经济

EESS 电气与系统 5

eess 电气与系统 eess.AS 音频语音 eess.IV 图像视频 eess.SP 信号处理 eess.SY 系统控制

MATH 数学 33

math 数学 math.AC 交换代数 math.AG 代数几何 math.AP 偏微分方程 math.AT 代数拓扑 math.CA 经典分析 math.CO 组合数学 math.CT 范畴论 math.CV 复变函数 math.DG 微分几何 math.DS 动力系统 math.FA 泛函分析 math.GM 一般数学 math.GN 一般拓扑 math.GR 群论 math.GT 几何拓扑 math.HO 历史综述 math.IT 信息论 math.KT K理论 math.LO 逻辑 math.MG 度量几何 math.MP 数学物理 math.NA 数值分析 math.NT 数论 math.OA 算子代数 math.OC 优化控制 math.PR 概率 math.QA 量子代数 math.RA 环与代数 math.RT 表示论 math.SG 辛几何 math.SP 谱理论 math.ST 统计理论

PHYSICS 物理 55

astro-ph 天体物理 astro-ph.CO 宇宙学 astro-ph.EP 地球行星 astro-ph.GA 星系物理 astro-ph.HE 高能天体 astro-ph.IM 天文仪器 astro-ph.SR 太阳恒星 cond-mat 凝聚态 cond-mat.dis-nn 无序神经 cond-mat.mes-hall 介观纳米 cond-mat.mtrl-sci 材料科学 cond-mat.other 其他凝聚态 cond-mat.quant-gas 量子气体 cond-mat.soft 软凝聚态 cond-mat.stat-mech 统计力学 cond-mat.str-el 强关联电子 cond-mat.supr-con 超导 gr-qc 广义相对论 hep-ex 高能实验 hep-lat 格点高能 hep-ph 高能唯象 hep-th 高能理论 math-ph 数学物理 nlin 非线性科学 nlin.AO 自适应系统 nlin.CD 混沌动力学 nlin.CG 胞自动机 nlin.PS 斑图孤子 nlin.SI 可积系统 nucl-ex 核物理实验 nucl-th 核物理理论 physics 物理 physics.acc-ph 加速器物理 physics.ao-ph 大气海洋 physics.app-ph 应用物理 physics.atm-clus 原子分子团簇 physics.atom-ph 原子物理 physics.bio-ph 生物物理 physics.chem-ph 化学物理 physics.class-ph 经典物理 physics.comp-ph 计算物理 physics.data-an 数据分析 physics.ed-ph 物理教育 physics.flu-dyn 流体动力学 physics.gen-ph 普通物理 physics.geo-ph 地球物理 physics.hist-ph 物理史哲 physics.ins-det 仪器探测 physics.med-ph 医学物理 physics.optics 光学 physics.plasm-ph 等离子体 physics.pop-ph 科普物理 physics.soc-ph 物理与社会 physics.space-ph 空间物理 quant-ph 量子物理

Q-BIO 定量生物 11

q-bio 定量生物 q-bio.BM 生物分子 q-bio.CB 细胞行为 q-bio.GN 基因组学 q-bio.MN 分子网络 q-bio.NC 神经认知 q-bio.OT 其他定量生物 q-bio.PE 种群进化 q-bio.QM 定量方法 q-bio.SC 亚细胞过程 q-bio.TO 组织器官

Q-FIN 定量金融 10

q-fin 定量金融 q-fin.CP 计算金融 q-fin.EC 经济学 q-fin.GN 一般金融 q-fin.MF 数学金融 q-fin.PM 投资组合 q-fin.PR 证券定价 q-fin.RM 风险管理 q-fin.ST 统计金融 q-fin.TR 交易微观结构

STAT 统计 7

stat 统计 stat.AP 统计应用 stat.CO 统计计算 stat.ME 统计方法 stat.ML 机器学习 stat.OT 其他统计 stat.TH 统计理论

2601.16216 2026-01-23 cs.AI cs.GT cs.SE

Scalable Board Expansion within a General Game System

Clémentine Sacré

Comments 65 pages, 41 figures

2601.16214 2026-01-23 cs.CV

CamPilot: Improving Camera Control in Video Diffusion Model with Efficient Camera Reward Feedback

Wenhang Ge, Guibao Shen, Jiawei Feng, Luozhou Wang, Hao Lu, Xingye Tian, Xin Tao, Ying-Cong Chen

2601.16208 2026-01-23 cs.CV

Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders

Shengbang Tong, Boyang Zheng, Ziteng Wang, Bingda Tang, Nanye Ma, Ellis Brown, Jihan Yang, Rob Fergus, Yann LeCun, Saining Xie

Comments website: https://rae-dit.github.io/scale-rae/

2601.16205 2026-01-23 cs.LG cs.AI

Counterfactual Training: Teaching Models Plausible and Actionable Explanations

Patrick Altmeyer, Aleksander Buszydlik, Arie van Deursen, Cynthia C. S. Liem

Comments This work has been accepted for publication at the 2026 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML). The final version will be available on IEEE Xplore

2601.16192 2026-01-23 cs.CV

360Anything: Geometry-Free Lifting of Images and Videos to 360°

Ziyi Wu, Daniel Watson, Andrea Tagliasacchi, David J. Fleet, Marcus A. Brubaker, Saurabh Saxena

Comments Project page: https://360anything.github.io/

2601.16163 2026-01-23 cs.AI cs.RO

Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning

Moo Jin Kim, Yihuai Gao, Tsung-Yi Lin, Yen-Chen Lin, Yunhao Ge, Grace Lam, Percy Liang, Shuran Song, Ming-Yu Liu, Chelsea Finn, Jinwei Gu

详情

英文摘要

Recent video generation models demonstrate remarkable ability to capture complex physical interactions and scene evolution over time. To leverage their spatiotemporal priors, robotics works have adapted video models for policy learning but introduce complexity by requiring multiple stages of post-training and new architectural components for action generation. In this work, we introduce Cosmos Policy, a simple approach for adapting a large pretrained video model (Cosmos-Predict2) into an effective robot policy through a single stage of post-training on the robot demonstration data collected on the target platform, with no architectural modifications. Cosmos Policy learns to directly generate robot actions encoded as latent frames within the video model's latent diffusion process, harnessing the model's pretrained priors and core learning algorithm to capture complex action distributions. Additionally, Cosmos Policy generates future state images and values (expected cumulative rewards), which are similarly encoded as latent frames, enabling test-time planning of action trajectories with higher likelihood of success. In our evaluations, Cosmos Policy achieves state-of-the-art performance on the LIBERO and RoboCasa simulation benchmarks (98.5% and 67.1% average success rates, respectively) and the highest average score in challenging real-world bimanual manipulation tasks, outperforming strong diffusion policies trained from scratch, video model-based policies, and state-of-the-art vision-language-action models fine-tuned on the same robot demonstrations. Furthermore, given policy rollout data, Cosmos Policy can learn from experience to refine its world model and value function and leverage model-based planning to achieve even higher success rates in challenging tasks. We release code, models, and training data at https://research.nvidia.com/labs/dir/cosmos-policy/

URL PDF HTML ☆

赞 0 踩 0

2601.14490 2026-01-23 cs.CV cs.AI cs.CL cs.LG

GutenOCR: A Grounded Vision-Language Front-End for Documents

Hunter Heidenreich, Ben Elliott, Olivia Dinica, Yosheb Getachew

2601.10720 2026-01-23 cs.RO

Verified Design of Robotic Autonomous Systems using Probabilistic Model Checking

Atef Azaiez, David Alireza Anisi

Comments Accepted in ModelSWARD 2026 conference, 7 figures

2601.08875 2026-01-23 cs.CV cs.AI cs.LG

Learning Domain-Invariant Representations for Cross-Domain Image Registration via Scene-Appearance Disentanglement

Jiahao Qin, Yiwen Wang

Comments 6 pages, 2 figures, 4 tables. Code available at https://github.com/D-ST-Sword/SAR-NET

2507.01001 2026-01-23 cs.CL cs.AI

SciArena: An Open Evaluation Platform for Non-Verifiable Scientific Literature-Grounded Tasks

Yilun Zhao, Kaiyan Zhang, Tiansheng Hu, Sihong Wu, Ronan Le Bras, Charles McGrady, Taira Anderson, Jonathan Bragg, Joseph Chee Chang, Jesse Dodge, Matt Latzke, Yixin Liu, Xiangru Tang, Zihang Wang, Chen Zhao, Hannaneh Hajishirzi, Doug Downey, Arman Cohan

Comments NeurIPS 2025 Datasets & Benchmarks Track (Spotlight)

2505.22797 2026-01-23 cs.CV cs.NA math.NA physics.med-ph

Fast Trajectory-Independent Model-Based Reconstruction Algorithm for Multi-Dimensional Magnetic Particle Imaging

Vladyslav Gapyak, Thomas März, Andreas Weinmann

Comments 10 pages, 5 figures. This work has been submitted to the IEEE for possible publication

Journal ref Phys. Med. Biol. 70 (2025) 235028

2503.18944 2026-01-23 cs.CV

DINO in the Room: Leveraging 2D Foundation Models for 3D Segmentation

Karim Knaebel, Kadir Yilmaz, Daan de Geus, Alexander Hermans, David Adrian, Timm Linder, Bastian Leibe

Comments Accepted to 3DV 2026. Project page at https://vision.rwth-aachen.de/ditr

2411.09693 2026-01-23 cs.CV

CropCraft: Complete Structural Characterization of Crop Plants From Images

Albert J. Zhai, Xinlei Wang, Kaiyuan Li, Zhao Jiang, Junxiong Zhou, Sheng Wang, Zhenong Jin, Kaiyu Guan, Shenlong Wang

Comments 3DV 2026 (Oral). Project page: https://ajzhai.github.io/CropCraft

2409.11355 2026-01-23 cs.CV

Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think

Gonzalo Martin Garcia, Karim Knaebel, Christian Schmidt, Daan de Geus, Alexander Hermans, Bastian Leibe

Comments WACV 2025 Oral. Project page at https://vision.rwth-aachen.de/diffusion-e2e-ft

2408.09253 2026-01-23 cs.RO cs.SY eess.SY

Reinforcement Learning Compensated Model Predictive Control for Off-road Driving on Unknown Deformable Terrain

Prakhar Gupta, Jonathon M. Smereka, Yunyi Jia

Comments Submitted to IEEE Transactions on Intelligent Vehicles as a Regular Paper; was withdrawn in March 2025. A revised version of this manuscript was submitted to ACC 2025 review as a regular paper in Sep 2025

2403.16428 2026-01-23 cs.CV

Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects

Zicong Fan, Takehiko Ohkawa, Linlin Yang, Nie Lin, Zhishan Zhou, Shihao Zhou, Jiajun Liang, Zhong Gao, Xuanyang Zhang, Xue Zhang, Fei Li, Zheng Liu, Feng Lu, Karim Knaebel, Bastian Leibe, Jeongwan On, Seungryul Baek, Aditya Prakash, Saurabh Gupta, Kun He, Yoichi Sato, Otmar Hilliges, Hyung Jin Chang, Angela Yao

Comments Accepted to ECCV 2024

2402.05406 2026-01-23 cs.LG cs.CL

Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes

Steven Kolawole, Lucio Dery, Jean-François Kagy, Virginia Smith, Graham Neubig, Ameet Talwalkar

Comments 19 pages, 6 fiigures, 16 tables

2401.18034 2026-01-23 cs.CL cs.AI

Paramanu: Compact and Competitive Monolingual Language Models for Low-Resource Morphologically Rich Indian Languages

Mitodru Niyogi, Eric Gaussier, Arnab Bhattacharya

2303.16570 2026-01-23 cs.CV

Point2Vec for Self-Supervised Representation Learning on Point Clouds

Karim Knaebel, Jonas Schult, Alexander Hermans, Bastian Leibe

Comments Accepted at GCPR 2023. Project page at https://vision.rwth-aachen.de/point2vec

2601.16158 2026-01-23 cs.SD cs.LG

Domain-Incremental Continual Learning for Robust and Efficient Keyword Spotting in Resource Constrained Systems

Prakash Dhungana, Sayed Ahmad Salehi

Comments 12 pages, 8 figures, and 3 tables

2601.16155 2026-01-23 cs.CV cs.IR

HVD: Human Vision-Driven Video Representation Learning for Text-Video Retrieval

Zequn Xie, Xin Liu, Boyun Zhang, Yuxiao Lin, Sihang Cai, Tao Jin

Comments Accepted by ICASSP 2026

2601.16150 2026-01-23 cs.SD cs.AI

Pay (Cross) Attention to the Melody: Curriculum Masking for Single-Encoder Melodic Harmonization

Maximos Kaliakatsos-Papakostas, Dimos Makris, Konstantinos Soiledis, Konstantinos-Theodoros Tsamis, Vassilis Katsouros, Emilios Cambouropoulos

2601.16147 2026-01-23 cs.LG

Beat-ssl: Capturing Local ECG Morphology through Heartbeat-level Contrastive Learning with Soft Targets

Muhammad Ilham Rizqyawan, Peter Macfarlane, Stathis Hadjidemetriou, Fani Deligianni

Comments Accepted at ISBI 2026

2601.16140 2026-01-23 cs.CV cs.AI cs.CR

Learning to Watermark in the Latent Space of Generative Models

Sylvestre-Alvise Rebuffi, Tuan Tran, Valeriu Lacatusu, Pierre Fernandez, Tomáš Souček, Nikola Jovanović, Tom Sander, Hady Elsahar, Alexandre Mourachko

Comments Code and models are available at https://github.com/facebookresearch/distseal

2601.16139 2026-01-23 cs.LG

On the Intrinsic Dimensions of Data in Kernel Learning

Rustem Takhanov

Comments Accepted to The 29th International Conference on Artificial Intelligence and Statistics (AISTATS 2026)

2601.16138 2026-01-23 cs.CL cs.LG

Automatic Classification of Arabic Literature into Historical Eras

Zainab Alhathloul, Irfan Ahmad

Comments 27 pages

2601.16134 2026-01-23 cs.AI cs.CL

LLM Prompt Evaluation for Educational Applications

Langdon Holmes, Adam Coscia, Scott Crossley, Joon Suh Choi, Wesley Morris

2601.16125 2026-01-23 cs.CV cs.CL cs.IR

Rethinking Composed Image Retrieval Evaluation: A Fine-Grained Benchmark from Image Editing

Tingyu Song, Yanzhao Zhang, Mingxin Li, Zhuoning Guo, Dingkun Long, Pengjun Xie, Siyue Zhang, Yilun Zhao, Shu Wu

Comments Under review

2601.16113 2026-01-23 cs.CL cs.CV

synthocr-gen: A synthetic ocr dataset generator for low-resource languages- breaking the data barrier

Haq Nawaz Malik, Kh Mohmad Shafi, Tanveer Ahmad Reshi

2601.16112 2026-01-23 cs.LG

Variable Splitting Binary Tree Models Based on Bayesian Context Tree Models for Time Series Segmentation

Yuta Nakahara, Shota Saito, Kohei Horinouchi, Koshi Shimada, Naoki Ichijo, Manabu Kobayashi, Toshiyasu Matsushima

AI 大模型

视觉与机器人

科学与医疗