arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

检索范围排序方式

检索时间范围

重置

HOT 人工智能、机器人等 9

cs.AI 人工智能 cs.CV 计算机视觉 cs.CL 自然语言处理 cs.RO 机器人 cs.LG 机器学习 cs.SD 声音 cs.ET 新兴技术 eess.AS 音频语音 eess.IV 图像视频

CS 计算机 41

cs 计算机 cs.AI 人工智能 cs.AR 硬件架构 cs.CC 计算复杂性 cs.CE 计算工程 cs.CG 计算几何 cs.CL 自然语言处理 cs.CR 密码安全 cs.CV 计算机视觉 cs.CY 计算机与社会 cs.DB 数据库 cs.DC 分布式计算 cs.DL 数字图书馆 cs.DM 离散数学 cs.DS 数据结构 cs.ET 新兴技术 cs.FL 形式语言 cs.GL 综述文献 cs.GR 图形学 cs.GT 博弈论 cs.HC 人机交互 cs.IR 信息检索 cs.IT 信息论 cs.LG 机器学习 cs.LO 计算机逻辑 cs.MA 多智能体 cs.MM 多媒体 cs.MS 数学软件 cs.NA 数值分析 cs.NE 神经进化 cs.NI 网络架构 cs.OH 其他计算机 cs.OS 操作系统 cs.PF 性能 cs.PL 编程语言 cs.RO 机器人 cs.SC 符号计算 cs.SD 声音 cs.SE 软件工程 cs.SI 社会信息网络 cs.SY 系统控制

ECON 经济学 4

econ 经济学 econ.EM 计量经济 econ.GN 一般经济 econ.TH 理论经济

EESS 电气与系统 5

eess 电气与系统 eess.AS 音频语音 eess.IV 图像视频 eess.SP 信号处理 eess.SY 系统控制

MATH 数学 33

math 数学 math.AC 交换代数 math.AG 代数几何 math.AP 偏微分方程 math.AT 代数拓扑 math.CA 经典分析 math.CO 组合数学 math.CT 范畴论 math.CV 复变函数 math.DG 微分几何 math.DS 动力系统 math.FA 泛函分析 math.GM 一般数学 math.GN 一般拓扑 math.GR 群论 math.GT 几何拓扑 math.HO 历史综述 math.IT 信息论 math.KT K理论 math.LO 逻辑 math.MG 度量几何 math.MP 数学物理 math.NA 数值分析 math.NT 数论 math.OA 算子代数 math.OC 优化控制 math.PR 概率 math.QA 量子代数 math.RA 环与代数 math.RT 表示论 math.SG 辛几何 math.SP 谱理论 math.ST 统计理论

PHYSICS 物理 55

astro-ph 天体物理 astro-ph.CO 宇宙学 astro-ph.EP 地球行星 astro-ph.GA 星系物理 astro-ph.HE 高能天体 astro-ph.IM 天文仪器 astro-ph.SR 太阳恒星 cond-mat 凝聚态 cond-mat.dis-nn 无序神经 cond-mat.mes-hall 介观纳米 cond-mat.mtrl-sci 材料科学 cond-mat.other 其他凝聚态 cond-mat.quant-gas 量子气体 cond-mat.soft 软凝聚态 cond-mat.stat-mech 统计力学 cond-mat.str-el 强关联电子 cond-mat.supr-con 超导 gr-qc 广义相对论 hep-ex 高能实验 hep-lat 格点高能 hep-ph 高能唯象 hep-th 高能理论 math-ph 数学物理 nlin 非线性科学 nlin.AO 自适应系统 nlin.CD 混沌动力学 nlin.CG 胞自动机 nlin.PS 斑图孤子 nlin.SI 可积系统 nucl-ex 核物理实验 nucl-th 核物理理论 physics 物理 physics.acc-ph 加速器物理 physics.ao-ph 大气海洋 physics.app-ph 应用物理 physics.atm-clus 原子分子团簇 physics.atom-ph 原子物理 physics.bio-ph 生物物理 physics.chem-ph 化学物理 physics.class-ph 经典物理 physics.comp-ph 计算物理 physics.data-an 数据分析 physics.ed-ph 物理教育 physics.flu-dyn 流体动力学 physics.gen-ph 普通物理 physics.geo-ph 地球物理 physics.hist-ph 物理史哲 physics.ins-det 仪器探测 physics.med-ph 医学物理 physics.optics 光学 physics.plasm-ph 等离子体 physics.pop-ph 科普物理 physics.soc-ph 物理与社会 physics.space-ph 空间物理 quant-ph 量子物理

Q-BIO 定量生物 11

q-bio 定量生物 q-bio.BM 生物分子 q-bio.CB 细胞行为 q-bio.GN 基因组学 q-bio.MN 分子网络 q-bio.NC 神经认知 q-bio.OT 其他定量生物 q-bio.PE 种群进化 q-bio.QM 定量方法 q-bio.SC 亚细胞过程 q-bio.TO 组织器官

Q-FIN 定量金融 10

q-fin 定量金融 q-fin.CP 计算金融 q-fin.EC 经济学 q-fin.GN 一般金融 q-fin.MF 数学金融 q-fin.PM 投资组合 q-fin.PR 证券定价 q-fin.RM 风险管理 q-fin.ST 统计金融 q-fin.TR 交易微观结构

STAT 统计 7

stat 统计 stat.AP 统计应用 stat.CO 统计计算 stat.ME 统计方法 stat.ML 机器学习 stat.OT 其他统计 stat.TH 统计理论

2602.06176 2026-02-09 cs.AI cs.CL cs.LG

Large Language Model Reasoning Failures

Peiyang Song, Pengrui Han, Noah Goodman

Comments Repository: https://github.com/Peiyang-Song/Awesome-LLM-Reasoning-Failures. Published at TMLR 2026 with Survey Certification

2602.06166 2026-02-09 cs.CV

M3: High-fidelity Text-to-Image Generation via Multi-Modal, Multi-Agent and Multi-Round Visual Reasoning

Bangji Yang, Ruihan Guo, Jiajun Fan, Chaoran Cheng, Ge Liu

2602.06163 2026-02-09 cs.CV

MetaSSP: Enhancing Semi-supervised Implicit 3D Reconstruction through Meta-adaptive EMA and SDF-aware Pseudo-label Evaluation

Luoxi Zhang, Chun Xie, Itaru Kitahara

2602.06158 2026-02-09 cs.CV

MGP-KAD: Multimodal Geometric Priors and Kolmogorov-Arnold Decoder for Single-View 3D Reconstruction in Complex Scenes

Luoxi Zhang, Chun Xie, Itaru Kitahara

Comments 6 pages. Published in IEEE International Conference on Image Processing (ICIP) 2025

Journal ref Proc. IEEE International Conference on Image Processing (ICIP), 2025, pp. 1564-1569

2602.06157 2026-02-09 cs.LG

SCONE: A Practical, Constraint-Aware Plug-in for Latent Encoding in Learned DNA Storage

Cihan Ruan, Lebin Zhou, Rongduo Han, Linyi Han, Bingqing Zhao, Chenchen Zhu, Wei Jiang, Wei Wang, Nam Ling

2602.06155 2026-02-09 cs.LG stat.ML

Latent Structure Emergence in Diffusion Models via Confidence-Based Filtering

Wei Wei, Yizhou Zeng, Kuntian Chen, Sophie Langer, Mariia Seleznova, Hung-Hsu Chou

2602.06146 2026-02-09 cs.LG math.OC stat.ML

Optimistic Training and Convergence of Q-Learning -- Extended Version

Prashant Mehta, Sean Meyn

2602.06139 2026-02-09 cs.CV

EgoAVU: Egocentric Audio-Visual Understanding

Ashish Seth, Xinhao Mei, Changsheng Zhao, Varun Nagaraja, Ernie Chang, Gregory P. Meyer, Gael Le Lan, Yunyang Xiong, Vikas Chandra, Yangyang Shi, Dinesh Manocha, Zhipeng Cai

2602.06129 2026-02-09 cs.LG cs.AI

Urban Spatio-Temporal Foundation Models for Climate-Resilient Housing: Scaling Diffusion Transformers for Disaster Risk Prediction

Olaf Yunus Laitinen Imanov, Derya Umut Kulali, Taner Yilmaz

Comments 10 pages, 5 figures. Submitted to IEEE Transactions on Intelligent Vehicles

2602.06127 2026-02-09 cs.LG

Compressing LLMs with MoP: Mixture of Pruners

Bruno Lopes Yamamoto, Lucas Lauton de Alcantara, Victor Zacarias, Leandro Giusti Mugnaini, Keith Ando Ogawa, Lucas Pellicer, Rosimeire Pereira Costa, Edson Bollis, Anna Helena Reali Costa, Artur Jordao

Comments Code and models are available at: https://github.com/c2d-usp/Efficient-LLMs-with-MoP

2602.06122 2026-02-09 cs.CV

From Blurry to Believable: Enhancing Low-quality Talking Heads with 3D Generative Priors

Ding-Jiun Huang, Yuanhao Wang, Shao-Ji Yuan, Albert Mosella-Montoro, Francisco Vicente Carrasco, Cheng Zhang, Fernando De la Torre

Comments Accepted to 3DV 2026. Project Page: https://humansensinglab.github.io/super-head/

2602.06110 2026-02-09 cs.LG cs.CR quant-ph

Private and interpretable clinical prediction with quantum-inspired tensor train models

José Ramón Pareja Monturiol, Juliette Sinnott, Roger G. Melko, Mohammad Kohandel

Comments 21 pages, 5 figures, 9 tables. The code for the experiments is publicly available at https://github.com/joserapa98/tns4loris

2602.06107 2026-02-09 cs.AI

Jackpot: Optimal Budgeted Rejection Sampling for Extreme Actor-Policy Mismatch Reinforcement Learning

Zhuoming Chen, Hongyi Liu, Yang Zhou, Haizhong Zheng, Beidi Chen

Comments ICLR 2026

2602.06103 2026-02-09 cs.LG

Toward Faithful and Complete Answer Construction from a Single Document

Zhaoyang Chen, Cody Fleming

2602.06097 2026-02-09 cs.LG

Agentic Workflow Using RBA$_θ$ for Event Prediction

Purbak Sengupta, Sambeet Mishra, Sonal Shreya

2602.06093 2026-02-09 cs.LG cs.AI

NanoNet: Parameter-Efficient Learning with Label-Scarce Supervision for Lightweight Text Mining Model

Qianren Mao, Yashuo Luo, Ziqi Qin, Junnan Liu, Weifeng Jiang, Zhijun Chen, Zhuoran Li, Likang Xiao, Chuou Xu, Qili Zhang, Hanwen Hao, Jingzheng Li, Chunghua Lin, Jianxin Li, Philip S. Yu

2602.06087 2026-02-09 cs.RO cs.SY eess.SY

Dynamic Modeling, Parameter Identification and Numerical Analysis of Flexible Cables in Flexibly Connected Dual-AUV Systems

Kuo Chen, Minghao Dou, Qianqi Liu, Yang An, Kai Ren, Zeming WU, Yu Tian, Jie Sun, Xinping Wang, Zhier Chen, Jiancheng Yu

2602.06053 2026-02-09 cs.CL

PersonaPlex: Voice and Role Control for Full Duplex Conversational Speech Models

Rajarshi Roy, Jonathan Raiman, Sang-gil Lee, Teodor-Dumitru Ene, Robert Kirby, Sungwon Kim, Jaehyeon Kim, Bryan Catanzaro

2602.06050 2026-02-09 cs.CL cs.CV

Relevance-aware Multi-context Contrastive Decoding for Retrieval-augmented Visual Question Answering

Jongha Kim, Byungoh Ko, Jeehye Na, Jinsung Yoon, Hyunwoo J. Kim

Comments WACV 2026

2602.06049 2026-02-09 cs.CL cs.AI

Recontextualizing Famous Quotes for Brand Slogan Generation

Ziao Yang, Zizhang Chen, Lei Zhang, Hongfu Liu

2602.04836 2026-02-09 cs.AI

Are AI Capabilities Increasing Exponentially? A Competing Hypothesis

Haosen Ge, Hamsa Bastani, Osbert Bastani

2601.21112 2026-02-09 cs.AI cs.SE

How does information access affect LLM monitors' ability to detect sabotage?

Rauno Arike, Raja Mehta Moreno, Rohan Subramani, Shubhorup Biswas, Francis Rhys Ward

Comments 54 pages, 34 figures, 7 tables

2512.17873 2026-02-09 cs.CV

Preserving Spectral Structure and Statistics in Diffusion Models

Baohua Yan, Jennifer Kava, Qingyuan Liu, Xuan Di

2512.17043 2026-02-09 cs.AI

UniRel: Relation-Centric Knowledge Graph Question Answering with RL-Tuned LLM Reasoning

Yinxu Tang, Chengsong Huang, Jiaxin Huang, William Yeoh

2512.12783 2026-02-09 cs.LG q-fin.ST stat.AP

Credit Risk Estimation with Non-Financial Features: Evidence from a Synthetic Istanbul Dataset

Atalay Denknalbant, Emre Sezdi, Zeki Furkan Kutlu

2512.03298 2026-02-09 cs.LG cs.AI

Adaptive Regime-Switching Forecasts with Distribution-Free Uncertainty: Deep Switching State-Space Models Meet Conformal Prediction

Echo Diyun LU, Charles Findling, Marianne Clausel, Alessandro Leite, Wei Gong, Pierric Kersaudy

Comments v2: Added acknowledgements

2512.02213 2026-02-09 cs.LG

InstructLR: A Scalable Approach to Create Instruction Dataset for Under-Resourced Languages

Mamadou K. Keita, Sebastien Diarra, Christopher Homan, Seydou Diallo

2511.18659 2026-02-09 cs.CL

CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning

Jie He, Richard He Bai, Sinead Williamson, Jeff Z. Pan, Navdeep Jaitly, Yizhe Zhang

2511.09484 2026-02-09 cs.RO cs.CV

SPIDER: Scalable Physics-Informed Dexterous Retargeting

Chaoyi Pan, Changhao Wang, Haozhi Qi, Zixi Liu, Homanga Bharadhwaj, Akash Sharma, Tingfan Wu, Guanya Shi, Jitendra Malik, Francois Hogan

Comments Project website: https://jc-bao.github.io/spider-project/

2511.08585 2026-02-09 cs.AI cs.CV

Simulating the Visual World with Artificial Intelligence: A Roadmap

Jingtong Yue, Ziqi Huang, Zhaoxi Chen, Xintao Wang, Pengfei Wan, Ziwei Liu

Comments Project page: https://world-model-roadmap.github.io/ Github Repo: https://github.com/ziqihuangg/Awesome-From-Video-Generation-to-World-Model

详情

英文摘要

The landscape of video generation is shifting, from a focus on generating visually appealing clips to building virtual environments that support interaction and maintain physical plausibility. These developments point toward the emergence of video foundation models that function not only as visual generators but also as implicit world models, models that simulate the physical dynamics, agent-environment interactions, and task planning that govern real or imagined worlds. This survey provides a systematic overview of this evolution, conceptualizing modern video foundation models as the combination of two core components: an implicit world model and a video renderer. The world model encodes structured knowledge about the world, including physical laws, interaction dynamics, and agent behavior. It serves as a latent simulation engine that enables coherent visual reasoning, long-term temporal consistency, and goal-driven planning. The video renderer transforms this latent simulation into realistic visual observations, effectively producing videos as a "window" into the simulated world. We trace the progression of video generation through four generations, in which the core capabilities advance step by step, ultimately culminating in a world model, built upon a video generation model, that embodies intrinsic physical plausibility, real-time multimodal interaction, and planning capabilities spanning multiple spatiotemporal scales. For each generation, we define its core characteristics, highlight representative works, and examine their application domains such as robotics, autonomous driving, and interactive gaming. Finally, we discuss open challenges and design principles for next-generation world models, including the role of agent intelligence in shaping and evaluating these systems. An up-to-date list of related works is maintained at this link.

URL PDF HTML ☆

赞 0 踩 0

AI 大模型

视觉与机器人

科学与医疗